Open sunshowers opened 3 months ago
In https://github.com/tokio-rs/mio/commit/1133ed006b9e2a6dcf27edcdf54309ce648c3a15 it looks like the waker impl on illumos (unintentionally?) got switched from pipe-based to eventfd.
PR to switch the impl back to being pipe-based: https://github.com/tokio-rs/mio/pull/1824
For the eventfd impls, there might be some bug-incompatibility between Linux and illumos here -- might be worth figuring out what it is.
I looked at the eventfd impl, as well as the Linux and illumos man pages, and to be honest the code is really simple and looks like it should work. Suggests possibly a bug in the illumos eventfd impl.
Too tired to continue but at least we have a good workaround :)
Repro:
git clone https://github.com/tokio-rs/mio
cd mio
git checkout 4a5114e518b982f49ce093be6d0d2a2ab86472d1
cargo test --all-features --test waker -- waker_multiple_wakeups_different_thread
@rmustacc has expressed an interest in looking at the issue.
Workaround is to set RUSTFLAGS="--cfg mio_unsupported_force_waker_pipe"
. (But note that rustflags settings do not compose: if someone has RUSTFLAGS set in the environment, then that will override build.rustflags
set in .cargo/config.toml
.
This appears to be a similar problem to https://www.illumos.org/issues/13436 in that the epoll wakeup appears to occur only on an eventfd transition from zero to non-zero. Resetting to zero before waking up results in the test passing.
diff --git a/src/sys/unix/waker/eventfd.rs b/src/sys/unix/waker/eventfd.rs
index c0086fc..3a1e6bf 100644
--- a/src/sys/unix/waker/eventfd.rs
+++ b/src/sys/unix/waker/eventfd.rs
@@ -42,6 +42,11 @@ impl Waker {
#[allow(clippy::unused_io_amount)] // Don't care about partial writes.
pub(crate) fn wake(&self) -> io::Result<()> {
+ // The epoll emulation on some illumos systems currently requires the
+ // eventfd to transition from zero for an edge-triggered wakeup.
+ #[cfg(target_os = "illumos")]
+ self.reset()?;
+
let buf: [u8; 8] = 1u64.to_ne_bytes();
match (&self.fd).write(&buf) {
Ok(_) => Ok(()),
bloody:mio:HEAD% cargo test --all-features --test waker -- waker_multiple_wakeups_different_thread
Finished `test` profile [unoptimized + debuginfo] target(s) in 0.08s
Running tests/waker.rs (target/debug/deps/waker-1f848cdbed8b9273)
running 1 test
test waker_multiple_wakeups_different_thread ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 6 filtered out; finished in 0.05s
This behaviour seems to be intentional in the illumos implementation:
static int
eventfd_write(dev_t dev, struct uio *uio, cred_t *credp)
{
...
/*
* Notify pollers as well if the eventfd is now readable.
*/
if (oval == 0) {
pollwakeup(&state->efd_pollhd, POLLRDNORM | POLLIN);
}
return (0);
}
I have opened https://www.illumos.org/issues/16700 upstream for this, and also opened https://github.com/illumos/epoll-test-suite/pull/2 to add a test to the illumos epoll test suite.
(filing this here for now just to put down notes)
Tokio 1.39.2's test suite consistently hangs on illumos while 1.38.0 doesn't. Via a git bisect we traced it down to this being an issue in mio 1.0, with (in the mio repo)
cargo nextest run --all-features --test waker waker_multiple_wakeups_different_thread
:Via another bisect we found that this pair of commits is to blame, and the pair looks relevant:
The second commit is a child of the first, and the first commit doesn't build on illumos, so consider the pair of commits as a unit.
Combined diff of the two commits in case it's helpful: https://gist.github.com/sunshowers/c573ae448d2c1eb028216c3f3d644719
cc @jclulow who first noticed the issue