epoll ICE on tokio test

tiif commented 1 week ago

Miri ICE on this Tokio test: https://github.com/tokio-rs/tokio/blob/27539ae3bdfd08b7503f4919fe13299537cc16fc/tokio/tests/rt_handle_block_on.rs#L146. It panics on the unwrap in this line.

I am currently working on getting a reproducible. This unwrap will fail when we try to retrieve an epoll instance that is closed from fd table. It supposed to be guarded by the upgrade of weak_epoll_interest, because epoll file description is the only one that holds a strong ref to EpollInterest, so when the epoll instance is closed, the upgrade of weak_epoll_interest should fail. This ICE is a case where weak_epoll_interest upgrade succeed, but the epoll file descriptor is closed.

Full trace

```rust test current_thread_scheduler::fs_shutdown_before_started ... thread 'rustc' panicked at src/tools/miri/src/shims/unix/linux/epoll.rs:556:87: called `Option::unwrap()` on a `None` value stack backtrace: 0: 0x73b696bfed9a - ::fmt::ha8b6a03154ea19c3 1: 0x73b697403157 - core::fmt::write::hb1fec16e33daff29 2: 0x73b6983e7fd1 - std::io::Write::write_fmt::h890fd89091c43625 3: 0x73b696c0146b - std::panicking::default_hook::{{closure}}::h8e59b6e48c542f1f 4: 0x73b696c010de - std::panicking::default_hook::he45ee90607527b0d 5: 0x73b695d5f349 - std[70086fc653812fb1]::panicking::update_hook::>::{closure#0} 6: 0x73b696c01d87 - std::panicking::rust_panic_with_hook::h57f9e7cd2d85b015 7: 0x73b696c01a13 - std::panicking::begin_panic_handler::{{closure}}::h7908f667b2a9ce39 8: 0x73b696bff249 - std::sys::backtrace::__rust_end_short_backtrace::h06bfcf84fe179410 9: 0x73b696c01714 - rust_begin_unwind 10: 0x73b693a96a33 - core::panicking::panic_fmt::hc040d8de5ffafafe 11: 0x73b693c705cc - core::panicking::panic::h04f2f8f092dfaaed 12: 0x73b694033b79 - core::option::unwrap_failed::hae733b7ee1eb872c 13: 0x60dfcb772d3d - as miri[e32df7a534aeb9a0]::shims::unix::linux::epoll::EvalContextExt>::check_and_update_readiness 14: 0x60dfcb774b06 - ::write 15: 0x60dfcb753054 - as miri[e32df7a534aeb9a0]::shims::unix::fd::EvalContextExt>::write 16: 0x60dfcb7428ee - as miri[e32df7a534aeb9a0]::shims::unix::foreign_items::EvalContextExt>::emulate_foreign_item_inner 17: 0x60dfcb7db1dd - as miri[e32df7a534aeb9a0]::shims::foreign_items::EvalContextExtPriv>::emulate_foreign_item_inner 18: 0x60dfcb7d713e - as miri[e32df7a534aeb9a0]::shims::foreign_items::EvalContextExt>::emulate_foreign_item 19: 0x60dfcb676df5 - >::init_fn_call 20: 0x60dfcb6ec622 - miri[e32df7a534aeb9a0]::eval::eval_entry 21: 0x60dfcb592fb0 - ::after_analysis 22: 0x73b6981a2e2f - rustc_interface[71f2dea45baddd28]::interface::run_compiler::, rustc_driver_impl[bde591410b6ce234]::run_compiler::{closure#0}>::{closure#1} 23: 0x73b6982555d0 - std[70086fc653812fb1]::sys::backtrace::__rust_begin_short_backtrace::, rustc_driver_impl[bde591410b6ce234]::run_compiler::{closure#0}>::{closure#1}, core[d1a550dca8d22a5b]::result::Result<(), rustc_span[fcf79a5fd0f12046]::ErrorGuaranteed>>::{closure#0}, core[d1a550dca8d22a5b]::result::Result<(), rustc_span[fcf79a5fd0f12046]::ErrorGuaranteed>>::{closure#0}::{closure#0}, core[d1a550dca8d22a5b]::result::Result<(), rustc_span[fcf79a5fd0f12046]::ErrorGuaranteed>> 24: 0x73b698255c3a - <::spawn_unchecked_, rustc_driver_impl[bde591410b6ce234]::run_compiler::{closure#0}>::{closure#1}, core[d1a550dca8d22a5b]::result::Result<(), rustc_span[fcf79a5fd0f12046]::ErrorGuaranteed>>::{closure#0}, core[d1a550dca8d22a5b]::result::Result<(), rustc_span[fcf79a5fd0f12046]::ErrorGuaranteed>>::{closure#0}::{closure#0}, core[d1a550dca8d22a5b]::result::Result<(), rustc_span[fcf79a5fd0f12046]::ErrorGuaranteed>>::{closure#1} as core[d1a550dca8d22a5b]::ops::function::FnOnce<()>>::call_once::{shim:vtable#0} 25: 0x73b698255fab - std::sys::pal::unix::thread::Thread::new::thread_start::h6f36a038c6277257 26: 0x73b692494ac3 - start_thread at ./nptl/pthread_create.c:442:8 27: 0x73b692526850 - __GI___clone3 at ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 28: 0x0 - error: the compiler unexpectedly panicked. this is a bug. note: we would appreciate a bug report: https://github.com/rust-lang/miri/issues/new note: please make sure that you have updated to the latest nightly note: please attach the file at `/home/byt/Documents/tokio/rustc-ice-2024-09-02T12_58_31-78699.txt` to your bug report note: compiler flags: -C embed-bitcode=no -C debuginfo=2 -C incremental=[REDACTED] -Z miri-disable-isolation -Z miri-tree-borrows note: some of the compiler flags provided by cargo are hidden query stack during panic: end of query stack Miri caused an ICE during evaluation. Here's the interpreter backtrace at the time of the panic: note: the place in the program where the ICE was triggered --> /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/pal/unix/fd.rs:291:13 | 291 | / libc::write( 292 | | self.as_raw_fd(), 293 | | buf.as_ptr() as *const libc::c_void, 294 | | cmp::min(buf.len(), READ_LIMIT), 295 | | ) | |_____________^ | = note: BACKTRACE on thread `current_thread_`: = note: inside `std::sys::pal::unix::fd::FileDesc::write` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/pal/unix/fd.rs:291:13: 295:14 = note: inside `std::sys::pal::unix::fs::File::write` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/pal/unix/fs.rs:1265:9: 1265:26 = note: inside `<&std::fs::File as std::io::Write>::write` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/fs.rs:850:9: 850:30 = note: inside `mio::sys::unix::waker::Waker::wake` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/mio-1.0.2/src/sys/unix/waker/eventfd.rs:53:15: 53:37 = note: inside `mio::waker::Waker::wake` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/mio-1.0.2/src/waker.rs:94:9: 94:26 note: inside `tokio::runtime::io::driver::Handle::unpark` --> /home/byt/Documents/tokio/tokio/src/runtime/io/driver.rs:209:9 | 209 | self.waker.wake().expect("failed to wake I/O driver"); | ^^^^^^^^^^^^^^^^^ note: inside `tokio::runtime::driver::IoHandle::unpark` --> /home/byt/Documents/tokio/tokio/src/runtime/driver.rs:198:46 | 198 | IoHandle::Enabled(handle) => handle.unpark(), | ^^^^^^^^^^^^^^^ note: inside `tokio::runtime::driver::Handle::unpark` --> /home/byt/Documents/tokio/tokio/src/runtime/driver.rs:90:9 | 90 | self.io.unpark(); | ^^^^^^^^^^^^^^^^ note: inside `::release` --> /home/byt/Documents/tokio/tokio/src/runtime/blocking/schedule.rs:50:21 | 50 | handle.driver.unpark(); | ^^^^^^^^^^^^^^^^^^^^^^ note: inside `tokio::runtime::task::harness::Harness::::{closure#0}::{closure#0}}>, tokio::runtime::blocking::schedule::BlockingSchedule>::release` --> /home/byt/Documents/tokio/tokio/src/runtime/task/harness.rs:362:29 | 362 | if let Some(task) = self.core().scheduler.release(&me) { | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ note: inside `tokio::runtime::task::harness::Harness::::{closure#0}::{closure#0}}>, tokio::runtime::blocking::schedule::BlockingSchedule>::complete` --> /home/byt/Documents/tokio/tokio/src/runtime/task/harness.rs:348:27 | 348 | let num_release = self.release(); | ^^^^^^^^^^^^^^ note: inside `tokio::runtime::task::harness::Harness::::{closure#0}::{closure#0}}>, tokio::runtime::blocking::schedule::BlockingSchedule>::shutdown` --> /home/byt/Documents/tokio/tokio/src/runtime/task/harness.rs:249:9 | 249 | self.complete(); | ^^^^^^^^^^^^^^^ note: inside `tokio::runtime::task::raw::shutdown::::{closure#0}::{closure#0}}>, tokio::runtime::blocking::schedule::BlockingSchedule>` --> /home/byt/Documents/tokio/tokio/src/runtime/task/raw.rs:311:5 | 311 | harness.shutdown(); | ^^^^^^^^^^^^^^^^^^ note: inside `tokio::runtime::task::raw::RawTask::shutdown` --> /home/byt/Documents/tokio/tokio/src/runtime/task/raw.rs:235:18 | 235 | unsafe { (vtable.shutdown)(self.ptr) } | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ note: inside `tokio::runtime::task::Task::::shutdown` --> /home/byt/Documents/tokio/tokio/src/runtime/task/mod.rs:427:9 | 427 | raw.shutdown(); | ^^^^^^^^^^^^^^ note: inside `tokio::runtime::task::UnownedTask::::shutdown` --> /home/byt/Documents/tokio/tokio/src/runtime/task/mod.rs:479:9 | 479 | self.into_task().shutdown(); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ note: inside `tokio::runtime::blocking::pool::Spawner::spawn_task` --> /home/byt/Documents/tokio/tokio/src/runtime/blocking/pool.rs:402:13 | 402 | task.task.shutdown(); | ^^^^^^^^^^^^^^^^^^^^ note: inside `tokio::runtime::blocking::pool::Spawner::spawn_blocking_inner::<{closure@tokio::fs::read_to_string<&str>::{closure#0}::{closure#0}}, std::result::Result>` --> /home/byt/Documents/tokio/tokio/src/runtime/blocking/pool.rs:391:23 | 391 | let spawned = self.spawn_task(Task::new(task, is_mandatory), rt); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ note: inside `tokio::runtime::blocking::pool::Spawner::spawn_blocking::<{closure@tokio::fs::read_to_string<&str>::{closure#0}::{closure#0}}, std::result::Result>` --> /home/byt/Documents/tokio/tokio/src/runtime/blocking/pool.rs:306:17 | 306 | self.spawn_blocking_inner(func, Mandatory::NonMandatory, None, rt) | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ note: inside `tokio::runtime::Handle::spawn_blocking::<{closure@tokio::fs::read_to_string<&str>::{closure#0}::{closure#0}}, std::result::Result>` --> /home/byt/Documents/tokio/tokio/src/runtime/handle.rs:224:9 | 224 | self.inner.blocking_spawner().spawn_blocking(self, func) | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ note: inside `tokio::runtime::blocking::pool::spawn_blocking::<{closure@tokio::fs::read_to_string<&str>::{closure#0}::{closure#0}}, std::result::Result>` --> /home/byt/Documents/tokio/tokio/src/runtime/blocking/pool.rs:184:5 | 184 | rt.spawn_blocking(func) | ^^^^^^^^^^^^^^^^^^^^^^^ note: inside closure --> /home/byt/Documents/tokio/tokio/src/fs/mod.rs:310:11 | 310 | match spawn_blocking(f).await { | ^^^^^^^^^^^^^^^^^ note: inside closure --> /home/byt/Documents/tokio/tokio/src/fs/read_to_string.rs:29:53 | 29 | asyncify(move || std::fs::read_to_string(path)).await | ^^^^^ note: inside closure --> /home/byt/Documents/tokio/tokio/src/runtime/park.rs:281:63 | 281 | if let Ready(v) = crate::runtime::coop::budget(|| f.as_mut().poll(&mut cx)) { | ^^^^^^^^^^^^^^^^^^^^^^^^ note: inside `tokio::runtime::coop::with_budget::>, {closure@tokio::runtime::park::CachedParkThread::block_on<{async fn body of tokio::fs::read_to_string<&str>()}>::{closure#0}}>` --> /home/byt/Documents/tokio/tokio/src/runtime/coop.rs:107:5 | 107 | f() | ^^^ note: inside `tokio::runtime::coop::budget::>, {closure@tokio::runtime::park::CachedParkThread::block_on<{async fn body of tokio::fs::read_to_string<&str>()}>::{closure#0}}>` --> /home/byt/Documents/tokio/tokio/src/runtime/coop.rs:73:5 | 73 | with_budget(Budget::initial(), f) | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ note: inside `tokio::runtime::park::CachedParkThread::block_on::<{async fn body of tokio::fs::read_to_string<&str>()}>` --> /home/byt/Documents/tokio/tokio/src/runtime/park.rs:281:31 | 281 | if let Ready(v) = crate::runtime::coop::budget(|| f.as_mut().poll(&mut cx)) { | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ note: inside `tokio::runtime::context::blocking::BlockingRegionGuard::block_on::<{async fn body of tokio::fs::read_to_string<&str>()}>` --> /home/byt/Documents/tokio/tokio/src/runtime/context/blocking.rs:66:9 | 66 | park.block_on(f) | ^^^^^^^^^^^^^^^^ note: inside closure --> /home/byt/Documents/tokio/tokio/src/runtime/handle.rs:324:13 | 324 | blocking.block_on(future).expect("failed to park thread") | ^^^^^^^^^^^^^^^^^^^^^^^^^ note: inside `tokio::runtime::context::runtime::enter_runtime::<{closure@tokio::runtime::Handle::block_on_inner<{async fn body of tokio::fs::read_to_string<&str>()}>::{closure#0}}, std::result::Result>` --> /home/byt/Documents/tokio/tokio/src/runtime/context/runtime.rs:65:16 | 65 | return f(&mut guard.blocking); | ^^^^^^^^^^^^^^^^^^^^^^ note: inside `tokio::runtime::Handle::block_on_inner::<{async fn body of tokio::fs::read_to_string<&str>()}>` --> /home/byt/Documents/tokio/tokio/src/runtime/handle.rs:323:9 | 323 | / context::enter_runtime(&self.inner, true, |blocking| { 324 | | blocking.block_on(future).expect("failed to park thread") 325 | | }) | |__________^ note: inside `tokio::runtime::Handle::block_on::<{async fn body of tokio::fs::read_to_string<&str>()}>` --> /home/byt/Documents/tokio/tokio/src/runtime/handle.rs:302:13 | 302 | self.block_on_inner(future) | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ note: inside `current_thread_scheduler::fs_shutdown_before_started` --> tokio/tests/rt_handle_block_on.rs:151:35 | 151 | let err: std::io::Error = Handle::current() | ___________________________________^ 152 | | .block_on(fs::read_to_string("Cargo.toml")) | |_______________________________________________________^ note: inside closure --> tokio/tests/rt_handle_block_on.rs:146:36 | 145 | #[test] | ------- in this procedural macro expansion 146 | fn fs_shutdown_before_started() { | ^ = note: inside `<{closure@tokio/tests/rt_handle_block_on.rs:146:5: 159:6} as std::ops::FnOnce<()>>::call_once - shim` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5: 250:71 = note: inside ` std::result::Result<(), std::string::String> as std::ops::FnOnce<()>>::call_once - shim(fn() -> std::result::Result<(), std::string::String>)` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5: 250:71 = note: inside `test::__rust_begin_short_backtrace::, fn() -> std::result::Result<(), std::string::String>>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:620:18: 620:21 = note: inside `test::types::RunnableTest::run` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/types.rs:145:40: 145:71 = note: inside closure at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:643:60: 643:79 = note: inside ` as std::ops::FnOnce<()>>::call_once` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:272:9: 272:19 = note: inside `std::panicking::r#try::do_call::, std::result::Result<(), std::string::String>>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:554:40: 554:43 = note: inside `std::panicking::r#try::, std::panic::AssertUnwindSafe<{closure@test::run_test_in_process::{closure#0}}>>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:518:19: 518:88 = note: inside `std::panic::catch_unwind::, std::result::Result<(), std::string::String>>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:345:14: 345:33 = note: inside `test::run_test_in_process` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:643:27: 643:81 = note: inside closure at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:564:43: 572:18 = note: inside closure at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:594:41: 594:83 = note: inside `std::sys::backtrace::__rust_begin_short_backtrace::<{closure@test::run_test::{closure#1}}, ()>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/backtrace.rs:154:18: 154:21 = note: inside closure at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:522:17: 522:71 = note: inside `::{closure#1}::{closure#0}}> as std::ops::FnOnce<()>>::call_once` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:272:9: 272:19 = note: inside `std::panicking::r#try::do_call::::{closure#1}::{closure#0}}>, ()>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:554:40: 554:43 = note: inside `std::panicking::r#try::<(), std::panic::AssertUnwindSafe<{closure@std::thread::Builder::spawn_unchecked_<'_, {closure@test::run_test::{closure#1}}, ()>::{closure#1}::{closure#0}}>>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:518:19: 518:88 = note: inside `std::panic::catch_unwind::::{closure#1}::{closure#0}}>, ()>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:345:14: 345:33 = note: inside closure at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:521:30: 523:16 = note: inside `<{closure@std::thread::Builder::spawn_unchecked_<'_, {closure@test::run_test::{closure#1}}, ()>::{closure#1}} as std::ops::FnOnce<()>>::call_once - shim(vtable)` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5: 250:71 = note: inside ` as std::ops::FnOnce<()>>::call_once` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:2231:9: 2231:52 = note: inside `> as std::ops::FnOnce<()>>::call_once` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:2231:9: 2231:52 = note: inside `std::sys::pal::unix::thread::Thread::new::thread_start` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/pal/unix/thread.rs:105:17: 105:64 = note: this note originates in the attribute macro `test` (in Nightly builds, run with -Z macro-backtrace for more info) error: test failed, to rerun pass `--test rt_handle_block_on` ```

Version:

rustc 1.82.0-nightly (a7399ba69 2024-08-31)
binary: rustc
commit-hash: a7399ba69d37b019677a9c47fe89ceb8dd82db2d
commit-date: 2024-08-31
host: x86_64-unknown-linux-gnu
release: 1.82.0-nightly
LLVM version: 19.1.0

tiif commented 1 week ago

@rustbot claim

tiif commented 1 week ago

A quick fix would be removing the unwrap and only use the epoll fd if it still exists, but I think it is better to investigate why the assumption is violated.

tiif commented 1 week ago

This is the test in tokio that ICE'd:

    #[test]
    fn fs_shutdown_before_started() {
        let rt = rt();
        let _enter = rt.enter();
        rt.shutdown_timeout(Duration::from_secs(1000));

        let err: std::io::Error = Handle::current()
            .block_on(fs::read_to_string("Cargo.toml"))
            .unwrap_err();

        assert_eq!(err.kind(), std::io::ErrorKind::Other);

        let inner_err = err.get_ref().expect("no inner error");
        assert_eq!(inner_err.to_string(), "background task failed");
    }

source

This is the reproducible I wrote to be tested in miri:

use tokio::fs;
use tokio::runtime::Handle;
use tokio::time::Duration;

fn main() {
    let rt = tokio::runtime::Builder::new_current_thread().enable_all().build().unwrap();
    let _enter = rt.enter();
    rt.shutdown_timeout(Duration::from_secs(1000));

    let err: std::io::Error =
        Handle::current().block_on(fs::read_to_string("Cargo.toml")).unwrap_err();

    assert_eq!(err.kind(), std::io::ErrorKind::Other);
    let inner_err = err.get_ref().expect("no inner error");
    assert_eq!(inner_err.to_string(), "background task failed");
}

The tokio test still ICE on latest nightly rustc 1.83.0-nightly (12b26c13f 2024-09-07), but the test I have written passed miri on commit 335c91c. Does anyone have any idea what could possibly caused this?

@Darksonn Could you help to take a look at this? I am not sure if my reproducible is equivalent to the original tokio test.

RalfJung commented 1 week ago

It might be kind of random if it depends on the exact scheduler behavior. A particular Miri version is deterministic, but it is not deterministic across multiple versions. Different nightlies means you explore different interleavings. On the nightly where your example managed to cause the ICE, does it also cause the ICE with -Zmiri-preemption-rate=0.0?

You can try running cargo miri test --many-seeds to explore a bunch of different interleavings.

The goal should be to get a reproducer that doesn't use tokio, but directly uses epoll. But that is probably best done by actually understanding what happens, and then writing code to specifically hit that corner case.

This is a Miri issue so I doubt tokio people will have a good idea of how to get to the bottom of this (but I may be wrong ofc :D ). It needs someone who understands the internal Miri implementation. Try adding more and more tracing to Miri until you understand where that invariant gets broken.

RalfJung commented 1 week ago

Why does EpollEventInterest even store an FD number to begin with? That's a relatively fragile identifier that we should generally never use inside Miri. Why not store a (Weak)FileDescriptionRef?

RalfJung commented 1 week ago

For instance, one could use dup to give a second FD number to an epoll instance, and then close the original epoll instance -- then the FD number in EpollEventInterest would be outdated but the epoll instance is still perfectly alive and running.

tiif commented 6 days ago

Thanks for the suggestions.

On the nightly where your example managed to cause the ICE, does it also cause the ICE with -Zmiri-preemption-rate=0.0?

Yes, the ICE still happened after I added that flag.

You can try running cargo miri test --many-seeds to explore a bunch of different interleavings.

For the test I have written, all seeds range from 0 to 63 won't ICE.

I think it might help to check if I somewhat got the rt() runtime configuration wrong, or tokio implicitly uses epoll somewhere in the test configuration. But I will also try to add more tracing in miri. :)

tiif commented 6 days ago

For instance, one could use dup to give a second FD number to an epoll instance, and then close the original epoll instance -- then the FD number in EpollEventInterest would be outdated but the epoll instance is still perfectly alive and running.

Good point, it is better to use WeakFileDescriptionRef so we don't need to retrieve Epoll file description from fd table again and this might also fix the ICE. I think initially fd number is used to prevent circular reference, but this shouldn't happen if we use WeakFileDescriptionRef.

I willl add a test for this.

RalfJung commented 6 days ago

I see there are two fd numbers in EpollEventInterest. Unless there's a really good reason to do something else, they should both be changed to WeakFileDescriptionRef IMO.

tiif commented 6 days ago

I think the file_descriptor field can't be replaced. That value is only used as the key of ready_list, which is a BTreeMap (the only line that uses that value is here). The entry of ready_list needs to be differentiated using both the fd number and FdId. Also, the key of BTreeMap needs to implement PartialOrd, Ord, PartialEq and Eq, so WeakFileDescriptionRef can't be used as the key.

RalfJung commented 6 days ago

Okay, that field should then have a comment saying that is is only used to report the FD number back to the user in case of an event.

tiif commented 6 days ago

Ok, now the test can ICE by adding test_util to the tokio dependency. Thanks @oli-obk for discovering this! :)

rust-lang / miri

epoll ICE on tokio test #3858

Version: