Investigate threaded runtime with single thread shutdown on Windows

Darksonn commented 4 years ago

CI sometimes fails the Windows tests with a deadlock. The tests that deadlock are:

threaded_scheduler_1_thread::runtime_in_thread_local
threaded_scheduler_1_thread::shutdown_timeout

They also print a panic message, which I assume is the cause of a deadlock.

Click to expand backtrace

``` thread 'tokio-runtime-worker' panicked at 'explicit panic', tokio\tests\rt_common.rs:661:17 stack backtrace: test threaded_scheduler_1_thread::panic_in_block_on ... ok 0: core::fmt::write at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\/src\libcore\fmt\mod.rs:1063 1: std::io::Write::write_fmt at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\/src\libstd\io\mod.rs:1426 2: std::sys_common::backtrace::_print 3: std::sys_common::backtrace::print 4: std::panicking::default_hook::{{closure}} at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\/src\libstd\panicking.rs:204 5: std::panicking::default_hook at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\/src\libstd\panicking.rs:224 6: std::panicking::rust_panic_with_hook at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\/src\libstd\panicking.rs:470 7: std::panicking::begin_panic at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\src\libstd\panicking.rs:397 8: rt_common::threaded_scheduler_1_thread::panic_in_task::{{impl}}::poll at .\tests\rt_common.rs:661 9: tokio::loom::std::unsafe_cell::UnsafeCell>::with_mut,core::task:: at .\src\loom\std\unsafe_cell.rs:14 10: tokio::runtime::task::core::Core>::poll> at .\src\runtime\task\harness.rs:107 12: core::ops::function::FnOnce::call_once at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\src\libcore\ops\function.rs:232 13: std::panic::{{impl}}::call_once>,closure-0> at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\src\libstd\panic.rs:318 14: alloc::sync::{{impl}}::clone at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\src\liballoc\sync.rs:972 15: panic_unwind::__rust_maybe_catch_panic at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\/src\libpanic_unwind\lib.rs:86 16: std::panicking::try>,std::panic::AssertUnwindSafe> at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\src\libstd\panicking.rs:281 17: core::intrinsics::copy_nonoverlapping at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\src\libcore\intrinsics.rs:1999 18: tokio::runtime::task::raw::poll> at .\src\runtime\task\raw.rs:104 19: tokio::runtime::task::Notified>::run> at .\src\runtime\task\mod.rs:169 20: tokio::runtime::task::raw::poll> at .\src\runtime\task\raw.rs:104 21: tokio::runtime::task::Notified>::run> at .\src\runtime\task\mod.rs:169 22: tokio::runtime::thread_pool::worker::{{impl}}::run_task::{{closure}} at .\src\runtime\thread_pool\worker.rs:353 23: tokio::coop::budget::{{closure}}, ()>> at .\src\coop.rs:117 24: std::thread::local::LocalKey>::try_with,closure-0,core::result::Result, ()>> at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\src\libstd\thread\local.rs:262 25: std::thread::local::LocalKey>::with,closure-0,core::result::Result, ()>> at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\src\libstd\thread\local.rs:239 26: tokio::runtime::thread_pool::worker::Context::run at .\src\runtime\thread_pool\worker.rs:324 27: tokio::runtime::thread_pool::worker::Context::run at .\src\runtime\thread_pool\worker.rs:324 28: tokio::runtime::thread_pool::worker::run::{{closure}} at .\src\runtime\thread_pool\worker.rs:309 29: tokio::macros::scoped_tls::ScopedKey::set at .\src\macros\scoped_tls.rs:63 30: tokio::runtime::thread_pool::worker::run at .\src\runtime\thread_pool\worker.rs:306 31: tokio::runtime::thread_pool::worker::{{impl}}::launch::{{closure}} at .\src\runtime\thread_pool\worker.rs:285 32: tokio::runtime::task::core::{{impl}}::poll::{{closure}},tokio::runtime::blocking::schedule::NoopSchedule> at .\src\runtime\task\core.rs:173 33: tokio::loom::std::unsafe_cell::UnsafeCell>>::with_mut>,core::task::poll::Po at .\src\loom\std\unsafe_cell.rs:14 34: tokio::runtime::task::core::{{impl}}::poll::{{closure}},tokio::runtime::blocking::schedule::NoopSchedule> at .\src\runtime\task\core.rs:173 35: tokio::runtime::task::core::Core, tokio::runtime::blocking::schedule::NoopSchedule>::poll,tokio::runtime::blocking::schedule::NoopSchedule> at .\src\runtime\task\core.rs:158 36: tokio::runtime::task::core::Core, tokio::runtime::blocking::schedule::NoopSchedule>::poll,tokio::runtime::blocking::schedule::NoopSchedule> at .\src\runtime\task\core.rs:158 37: tokio::runtime::task::harness::{{impl}}::poll::{{closure}},tokio::runtime::blocking::schedule::NoopSchedule> at .\src\runtime\task\harness.rs:107 38: core::ops::function::FnOnce::call_once at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\src\libcore\ops\function.rs:232 39: std::panic::{{impl}}::call_once>,closure-0> at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\src\libstd\panic.rs:318 40: core::sync::atomic::atomic_compare_exchange at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\src\libcore\sync\atomic.rs:2323 41: std::panicking::try>,std::panic::AssertUnwindSafe> at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\src\libstd\panicking.rs:281 42: tokio::runtime::task::harness::Harness, tokio::runtime::blocking::schedule::NoopSchedule>::poll,tokio::runtime::blocking::schedule::NoopSchedul at .\src\runtime\task\harness.rs:89 43: tokio::runtime::task::raw::poll,tokio::runtime::blocking::schedule::NoopSchedule> at .\src\runtime\task\raw.rs:104 44: std::thread::local::LocalKey>>::try_with>,closure-0,tokio::runtime::context::enter::DropGuard> at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\src\libstd\thread\local.rs:262 45: tokio::runtime::blocking::pool::Inner::run at .\src\runtime\blocking\pool.rs:230 46: tokio::runtime::blocking::pool::{{impl}}::spawn_thread::{{closure}}::{{closure}} at .\src\runtime\blocking\pool.rs:210 47: tokio::runtime::context::enter at .\src\runtime\context.rs:72 48: tokio::runtime::handle::Handle::enter at .\src\runtime\handle.rs:71 49: tokio::runtime::blocking::pool::{{impl}}::spawn_thread::{{closure}} at .\src\runtime\blocking\pool.rs:209 note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace. ```

LucioFranco commented 4 years ago

From the back trace this looks like it may be due to budget stuff?

Locally on my windows machine I can't reproduce this with individual runs.

jebrosen commented 4 years ago

The panic in the stack trace is from the panic_in_task test, and should be unrelated since each test runs in a separate runtime. As far as I have found, backtraces from intended panics interleave with the "ok"/"fail" messages even when all tests are successful.

emgre commented 4 years ago

I can replicate the threaded_scheduler_1_thread::runtime_in_thread_local deadlock all the time with #2649. Upon investigation, I think the issue is a deadlock in the destructor of a thread local storage variable, but I couldn't figure out exactly what.

I used to have threaded_scheduler_1_thread::shutdown_timeout block from time to time too, but not with #2649. I think it was related with the shutdown_timeout bug that I fixed.

tokio-rs / tokio

Investigate threaded runtime with single thread shutdown on Windows #2499