Closed tiif closed 1 month ago
Adding VClock to the epoll ready_list fixes the tokio reproducer.
Awesome. :) Nice job identifying which part of the non-determinism is the culprit here.
For the PR, I was asked to include the test case described above. I could use some help in figuring out how to get the test either run or compiled. When I try to run it, it won't compile but maybe there is a different way the test should be run?
I tried
./miri test libc-epoll-blocking
but that fails because the test harness says the function didn't compile and it shows a reasonable compiler error:
--- tests/pass-dep/libc/libc-epoll-blocking.stderr
+++ <stderr output>
+error: creating a shared reference to mutable static is discouraged
+ --> tests/pass-dep/libc/libc-epoll-blocking.rs:LL:CC
+ |
+LL | unsafe { assert_eq!(VAL, 1) };
+ | ^^^ shared reference to mutable static
+ |
+ = note: for more information, see <https://doc.rust-lang.org/nightly/edition-guide/rust-2024/static-mut-references.html>
+ = note: shared references to mutable statics are dangerous; it's undefined behavior if the static is mutated or if a mutable reference is created for it while the shared reference lives
+ = note: `-D static-mut-refs` implied by `-D warnings`
+ = help: to override `-D warnings` add `#[allow(static_mut_refs)]`
+
+error: miri cannot be run on programs that fail compilation
+
+error: aborting due to 2 previous errors
+
Follow-up question. I found another way to run the test so now it's just a warning but then an assertion fails.
$ ./miri run --dep tests/pass-dep/libc/libc-epoll-blocking.rs
warning: creating a shared reference to mutable static is discouraged
--> tests/pass-dep/libc/libc-epoll-blocking.rs:173:25
|
173 | unsafe { assert_eq!(VAL, 1) };
| ^^^ shared reference to mutable static
|
= note: for more information, see <https://doc.rust-lang.org/nightly/edition-guide/rust-2024/static-mut-references.html>
= note: shared references to mutable statics are dangerous; it's undefined behavior if the static is mutated or if a mutable reference is created for it while the shared reference lives
= note: `#[warn(static_mut_refs)]` on by default
thread 'main' panicked at tests/pass-dep/libc/libc-epoll-blocking.rs:171:5:
assertion `left == right` failed: got wrong events
left: 4
right: 5
and here you can see what line 171 in that file is:
142 // This test shows a data_race before epoll had vector clocks added.
143 fn test_epoll_race() {
144 // Create an epoll instance.
145 let epfd = unsafe { libc::epoll_create1(0) };
146 assert_ne!(epfd, -1);
147
148 // Create an eventfd instance.
149 let flags = libc::EFD_NONBLOCK | libc::EFD_CLOEXEC;
150 let fd = unsafe { libc::eventfd(0, flags) };
151
152 // Register eventfd with the epoll instance.
153 let mut ev = libc::epoll_event { events: EPOLL_IN_OUT_ET, u64: fd as u64 };
154 let res = unsafe { libc::epoll_ctl(epfd, libc::EPOLL_CTL_ADD, fd, &mut ev) };
155 assert_eq!(res, 0);
156
157 static mut VAL: u8 = 0;
158 let thread1 = thread::spawn(move || {
159 // Write to the static mut variable.
160 unsafe { VAL = 1 };
161 // Write to the eventfd instance.
162 let sized_8_data: [u8; 8] = 1_u64.to_ne_bytes();
163 let res = unsafe { libc::write(fd, sized_8_data.as_ptr() as *const libc::c_void, 8) };
164 // read returns number of bytes that have been read, which is always 8.
165 assert_eq!(res, 8);
166 });
167 thread::yield_now();
168 // epoll_wait for the event to happen.
169 let expected_event = u32::try_from(libc::EPOLLIN | libc::EPOLLOUT).unwrap();
170 let expected_value = u64::try_from(fd).unwrap();
171 check_epoll_wait::<8>(epfd, &[(expected_event, expected_value)], -1);
172 // Read from the static mut variable.
173 unsafe { assert_eq!(VAL, 1) };
174 thread1.join().unwrap();
175 }
If I run it as ./miri run --dep ./tests/pass-dep/libc/libc-epoll-blocking.rs -Zmiri-preemption-rate=0
then the test passes and there is just the warning about the shared reference to mutable static.
Is this the normal way to run tests and should I just commit this to the PR as is? The PR's CI isn't going to flag the warning, the assertion, or the compile error?
you can grep for -Zmiri-preemption-rate
to see how other tests add that flag.
the warnings can be disabled with the attribute mentioned in the diagnostic message
Under what conditions do the //@compile-flags:
kick in? I see the relevant one already in the file under test. But I also see a comment to indicate what test function it is needed for on the line above it and if I remove the lines completely, only the new/last function asserts a failure, not the one the comments indicated the flag was there for.
So why does putting the flag on the command line to miri work but not the comment in the header? I find dozens of test files with similar lines in their header.
Here is what the file already has:
//@only-target: linux
// test_epoll_block_then_unblock depends on a deterministic schedule.
//@compile-flags: -Zmiri-preemption-rate=0
I found the ui_test docs talk about this comment flag, but it doesn't help me understand why passing the -Z flag on the test command line makes it pass but not having it on the command line but in the test file, like dozens of others, and only this new test function fails?
Well, as soon as I posted that, I thought of something else to try.
Just ./miri test
and the tests pass. But only if that @compile-flags is in the file.
without that compile-flags line, the whole test reports the new epoll test function fails, but not the other function, maybe due to a different seed?
But when the test is run with a FILTER, either as a test or a run, the flag has to be provided on the command-line. Was this advice somewhere I should have been reading first - it would probably be in the ui_test docs but I don't see it? Or am I still misinterpreting things and the results are explained some other way? 😕
There are two ways to run the test in miri, one is ./miri test
, another one is ./miri run
. ./miri test
will understand the @compile-flags
(probably because this command is using ui_test) . In ./miri run
, I need to pass all the flags myself, and it behaves like running miri normally on a program, it won't give me hint to bless the test if it fails, it will just provide the normal output. Usually if ./miri test
pass, it will pass on CI.
There is a brief introduction about testing in https://github.com/rust-lang/miri/blob/master/CONTRIBUTING.md#building-and-testing-miri
If you are unsure, you can push the change to the PR first, and we can help to check what's wrong by reading the CI log.
Thanks for debugging this, finding a fix and opening a PR! This was definitely very unexpected from just looking at the original issue
I am wondering if there are API things we can do to reduce the likelihood of this. Thread unblocking almost always occurs because of some other event an another thread that we should register a causal link with, so maybe the thread unblocking function should take a clock. It's just we've so far kept those subsystems largely orthogonal.
Description
When running one of the tokio test with
cargo +nightly miri test --features full --test spawn_pinned
, miri reported deadlock onepoll_wait
syscall. This is likely caused by epoll shim not receiving an epoll event notification to unblock the thread when it should have received one in real world system.Version
Rustc:
Tokio: Tokio repo master branch commit
27539ae3
Full trace
```rust Finished `test` profile [unoptimized + debuginfo] target(s) in 0.06s Running tests/spawn_pinned.rs (target/miri/x86_64-unknown-linux-gnu/debug/deps/spawn_pinned-5d9d1b31504e33c5) running 9 tests test callback_panic_does_not_kill_worker ... error: deadlock: the evaluated program deadlocked --> /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/mio-1.0.2/src/sys/unix/selector/epoll.rs:56:9 | 56 | / syscall!(epoll_wait( 57 | | self.ep.as_raw_fd(), 58 | | events.as_mut_ptr(), 59 | | events.capacity() as i32, 60 | | timeout,d 61 | | )) | |__________^ the evaluated program deadlocked | = note: BACKTRACE on thread `unnamed-2`: = note: inside `mio::sys::unix::selector::Selector::select` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/mio-1.0.2/src/sys/unix/mod.rs:8:48: 8:49 = note: inside `mio::poll::Poll::poll` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/mio-1.0.2/src/poll.rs:435:9: 435:61 note: inside `tokio::runtime::io::driver::Driver::turn` --> /home/byt/Documents/tokio/tokio/src/runtime/io/driver.rs:149:15 | 149 | match self.poll.poll(events, max_wait) { | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ note: inside `tokio::runtime::io::driver::Driver::park` --> /home/byt/Documents/tokio/tokio/src/runtime/io/driver.rs:122:9 | 122 | self.turn(handle, None); | ^^^^^^^^^^^^^^^^^^^^^^^ note: inside `tokio::runtime::signal::Driver::park` --> /home/byt/Documents/tokio/tokio/src/runtime/signal/mod.rs:92:9 | 92 | self.io.park(handle); | ^^^^^^^^^^^^^^^^^^^^ note: inside `tokio::runtime::process::Driver::park` --> /home/byt/Documents/tokio/tokio/src/runtime/process.rs:32:9 | 32 | self.park.park(handle); | ^^^^^^^^^^^^^^^^^^^^^^ note: inside `tokio::runtime::driver::IoStack::park` --> /home/byt/Documents/tokio/tokio/src/runtime/driver.rs:175:40 | 175 | IoStack::Enabled(v) => v.park(handle), | ^^^^^^^^^^^^^^ note: inside `tokio::runtime::time::Driver::park_internal` --> /home/byt/Documents/tokio/tokio/src/runtime/time/mod.rs:247:21 | 247 | self.park.park(rt_handle); | ^^^^^^^^^^^^^^^^^^^^^^^^^ note: inside `tokio::runtime::time::Driver::park` --> /home/byt/Documents/tokio/tokio/src/runtime/time/mod.rs:173:9 | 173 | self.park_internal(handle, None); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ note: inside `tokio::runtime::driver::TimeDriver::park` --> /home/byt/Documents/tokio/tokio/src/runtime/driver.rs:332:55 | 332 | TimeDriver::Enabled { driver, .. } => driver.park(handle), | ^^^^^^^^^^^^^^^^^^^ note: inside `tokio::runtime::driver::Driver::park` --> /home/byt/Documents/tokio/tokio/src/runtime/driver.rs:71:9 | 71 | self.inner.park(handle); | ^^^^^^^^^^^^^^^^^^^^^^^ note: inside closure --> /home/byt/Documents/tokio/tokio/src/runtime/scheduler/current_thread/mod.rs:382:17 | 382 | driver.park(&handle.driver); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ note: inside `tokio::runtime::scheduler::current_thread::Context::enter::<(), {closure@tokio::runtime::scheduler::current_thread::Context::park::{closure#1}}>` --> /home/byt/Documents/tokio/tokio/src/runtime/scheduler/current_thread/mod.rs:423:19 | 423 | let ret = f(); | ^^^ note: inside `tokio::runtime::scheduler::current_thread::Context::park` --> /home/byt/Documents/tokio/tokio/src/runtime/scheduler/current_thread/mod.rs:381:27 | 381 | let (c, ()) = self.enter(core, || { | ___________________________^ 382 | | driver.park(&handle.driver); 383 | | self.defer.wake(); 384 | | }); | |______________^ note: inside closure --> /home/byt/Documents/tokio/tokio/src/runtime/scheduler/current_thread/mod.rs:724:33 | 724 | ... context.park(core, handle) | ^^^^^^^^^^^^^^^^^^^^^^^^^^ note: inside closure --> /home/byt/Documents/tokio/tokio/src/runtime/scheduler/current_thread/mod.rs:774:68 | 774 | let (core, ret) = context::set_scheduler(&self.context, || f(core, context)); | ^^^^^^^^^^^^^^^^ note: inside `tokio::runtime::context::scoped::Scoped::