tokio-rs / tokio-core

I/O primitives and event loop for async I/O in Rust
Apache License 2.0
640 stars 115 forks source link

Regression with Timeouts and/or UdpSockets? #316

Closed bluejekyll closed 6 years ago

bluejekyll commented 6 years ago

I hadn't intended to upgrade tokio yet, but did after a cargo update. After running my standard integration test suites, one of my timeout tests started hanging. I haven't had time to investigate deeply, but wanted to notify as quickly as possible about this. This appears to be specifically related to Udp and not Tcp.

As I haven't had time to dig into why the test is hanging, I'm assuming it's not Timeouts but something with the UdpSocket, due to no issues seen with Tcp.

reproduce with this repo: https://github.com/bluejekyll/trust-dns

hung:

cargo test --manifest-path integration-tests/Cargo.toml --test client_future_tests test_timeout_query_udp

very similar test and code path, but TCP works:

cargo test --manifest-path integration-tests/Cargo.toml --test client_future_tests test_timeout_query_tcp

This appears to have been introduced with tokio-core at 0.1.13 and continues through to 0.1.15

bluejekyll commented 6 years ago

Could this be related to #311? I’ll try and dig into the details when I have some time later today.

carllerche commented 6 years ago

It still hangs w/ the fix from #311, so I will try investigating as well

bluejekyll commented 6 years ago

Sorry! I forgot to give you a branch! tokio-core will need to be upgraded before reproducing.

carllerche commented 6 years ago

I looked through the trust-dns source. I'm wondering if this is a case of "it just happened to work" before and the internal implementation change exposed a bug.

You create the timeout [here]: https://github.com/bluejekyll/trust-dns/blob/master/proto/src/dns_handle.rs#L304

The only place I see the timeout polled is [here]: https://github.com/bluejekyll/trust-dns/blob/master/proto/src/dns_handle.rs#L192

So, it looks like you never poll the timeout until the next iteration of DnsFuture::poll, but nothing will actually notify the DnsFuture task (because the timeout is never polled), so poll will never be called.

I'm pretty sure I fixed some code that caused unnecessary spurious notifications, so it is possible that this broke the tests.

Hope this helps.

bluejekyll commented 6 years ago

Thanks for taking a look! I’ll double check the logic in that section and correct it.

carllerche commented 6 years ago

I'm going to close this issue. If I am wrong, please comment and I can open it up again.

Cheers.