tokio-rs / tokio

A runtime for writing reliable asynchronous applications with Rust. Provides I/O, networking, scheduling, timers, ...
https://tokio.rs
MIT License
26.46k stars 2.44k forks source link

Test `injection_queue_depth_multi_thread` is flaky #6847

Open Darksonn opened 5 days ago

Darksonn commented 5 days ago

This test has been observed to fail in CI:

https://github.com/tokio-rs/tokio/blob/83e922f051e341e4d69d04c7a8ef1050c19cb0f8/tokio/tests/rt_unstable_metrics.rs#L642-L668

To close this issue, figure out why it is failing and fix it.

jofas commented 4 days ago

I presume it's these two recent jobs that timed out

that have been affected by the flakiness of injection_queue_depth_multi_thread? I'd assume so because if this assert

https://github.com/tokio-rs/tokio/blob/83e922f051e341e4d69d04c7a8ef1050c19cb0f8/tokio/tests/rt_unstable_metrics.rs#L663

fails, the main thread panics and we never get to synchronise on barrier2

https://github.com/tokio-rs/tokio/blob/83e922f051e341e4d69d04c7a8ef1050c19cb0f8/tokio/tests/rt_unstable_metrics.rs#L667

here, causing the test to run forever (or until CI times out).

Darksonn commented 3 days ago

Yes. Good point with the assert. That explains why it times out instead of failing normally.

jofas commented 3 days ago

I wonder if a first step[^1] would be to convert the assert into an if-statement where we'd panic only after calling barrier2.wait(). Then we'd get better diagnostics but more importantly wouldn't time out CI any more.

[^1]: In case this isn't a quick fix