tokio-rs / mio

Metal I/O library for Rust.
MIT License
6.27k stars 720 forks source link

deregister() sometimes fails with "No such file or directory" on MacOS #911

Closed rom1v closed 5 years ago

rom1v commented 5 years ago

Several users reported this error on MacOS.

Today, I got more logs (see discussion from https://github.com/Genymobile/gnirehtet/issues/136#issuecomment-456893076) in which poll.register() returns a Token(3), but passing this token to poll.deregister() fails with No such file or directory.

This happens only on MacOS (apparently), so I guess it is related to the kqueue backend.

Thomasdezeeuw commented 5 years ago

The only way I can think of dergister returning ENOENT is by either double deregistering, or not registering in the first place.

If I'm correct https://github.com/Genymobile/gnirehtet/blob/2bf8a3cfbe2168031897b8f917c088fe6d47710c/relay-rust/src/relay/tcp_connection.rs#L788 is the line that causes this. Could you add an assertion/check that ensure that closed == false? This way we can make sure that it isn't double deregistered.

rom1v commented 5 years ago

Thank you for your answer.

Could you add an assertion/check that ensure that closed == false? This way we can make sure that it isn't double deregistered.

I will ask to the user who can reproduce the problem. But with the provided logs (with this patch https://github.com/Genymobile/gnirehtet/issues/136#issuecomment-456893076), we already see that Token(3) is registered once and not deregistered twice.

sile commented 5 years ago

I encountered a similar problem with FreeBSD. It can be reproduced as follows:

//
// FreeBSD
//
$ uname -a
FreeBSD freebsd-11-1 11.2-RELEASE-p9 FreeBSD 11.2-RELEASE-p9 #0: Tue Feb  5 15:30:36 UTC 2019     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

$ cargo new miotest
$ cd miotest/
$ echo 'mio = "0.6.16"' >> Cargo.toml
$ vi src/main.rs   // See below code
$ cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.03s
     Running `target/debug/miotest`
thread 'main' panicked at 'This expression fails on FreeBSD, but doesn't on Linux: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/libcore/result.rs:1009:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.

//
// Debian
//
$ uname -a
Linux instance-1 4.9.0-8-amd64 #1 SMP Debian 4.9.130-2 (2018-10-27) x86_64 GNU/Linux
// .. abbrev ..
$ cargo run
Succeeded
// src/main.rs: This code is copied from https://docs.rs/mio/0.6.16/mio/#example.
use mio::net::{TcpListener, TcpStream};
use mio::*;

const SERVER: Token = Token(0);
const CLIENT: Token = Token(1);

fn main() {
    let addr = "127.0.0.1:13265".parse().unwrap();

    let server = TcpListener::bind(&addr).unwrap();

    let poll = Poll::new().unwrap();

    poll.register(&server, SERVER, Ready::readable(), PollOpt::edge())
        .unwrap();

    let sock = TcpStream::connect(&addr).unwrap();

    // NOTE(sile): `PollOpt::oneshot()` is added by me.
    poll.register(
        &sock,
        CLIENT,
        Ready::readable(),
        PollOpt::edge() | PollOpt::oneshot(),
    )
    .unwrap();

    let mut events = Events::with_capacity(1024);

    loop {
        poll.poll(&mut events, None).unwrap();

        for event in events.iter() {
            match event.token() {
                SERVER => {
                    let _ = server.accept();
                }
                CLIENT => {
                    // NOTE(sile): The following two lines are added by me.
                    poll.deregister(&sock)
                        .expect("This expression fails on FreeBSD, but doesn't on Linux");
                    println!("Succeeded");
                    return;
                }
                _ => unreachable!(),
            }
        }
    }
}
asomers commented 5 years ago

This looks like a bug in how mio handles oneshot on kqueue-based systems:

With kqueue there are only two possible states for a kevent: enabled or disabled. EV_ONESHOT means that the kevent will become disabled after the first occurrence. However, Linux's epoll has three states: registered, not registered, or registered-but-disabled. EPOLLONESHOT means that a file will transition to the disabled state after it returns a single event. Here's a description of how EPOLLONESHOT should be used.

But mio seems to be treating EPOLLONESHOT and EV_ONESHOT identically.

Does mio even allow multiple threads to poll the same file descriptor at the same time? If not, then EPOLLONESHOT doesn't make sense and we should remove that option. If so, then we should add EPOLLEXCLUSIVE because it's better than EPOLLONESHOT. In any case, perhaps we should rename one or both of the oneshots since they aren't equivalent?

carllerche commented 5 years ago

@asomers I'm not exactly following your proposal.

asomers commented 5 years ago

I'm saying that since EPOLLONESHOT and EV_ONESHOT do different things, they shouldn't use the same apparently cross-platform API. Instead, mio should do one of three things: 1) PollOpt::oneshot should emulate EPOLLONESHOT behavior on kqueue-based systems. 2) PollOpt::oneshot should emulate EV_ONESHOT behavior on epoll-based systems 3) PollOpt::oneshot should be deprecated, and new non-portable options added, like PollOpt::epolloneshot and PollOpt::kqoneshot, or something like that.

carllerche commented 5 years ago

Unfortunately, things will get more complicated as the windows side of things is reworked. I'd like to switch the windows impl to something like: https://github.com/piscisaureus/wepoll

The downside is this strategy only supports oneshot.

carllerche commented 5 years ago

Searching around, I also found: https://github.com/jiixyj/epoll-shim/blob/master/src/epoll.c

asomers commented 5 years ago

That epoll-shim looks awfully heavyweight, and it doesn't even handle EPOLLONESHOT.

carllerche commented 5 years ago

I think the general strategy is to use some of the user data associated with the FD to track enough state to map the epoll API. We should be able to do this as well?

carllerche commented 5 years ago

@asomers I haven't looked in detail, what looks heavyweight?

asomers commented 5 years ago

That whole file you just linked. It looks like it's trying to emulate the epoll C API on kqueue-based systems.

carllerche commented 5 years ago

@asomers it also sounds like the problem stems from using deregister. One option might be to remove deregister from the portable API.

It already is not possible to move a FD between Poll instances, so deregister shouldn't be needed?

asomers commented 5 years ago

That would be another option. But are there any other legitimate use cases for using deregister? Does anybody ever decide to ignore a socket without closing it?

przygienda commented 5 years ago

yes, in fact during shutdown sequence you may very well want to do that since you don't want events from the socket but you don't want to close it since it causes the other side to see e.g. pipe broken ... Other case is when you deregister while draining the socket and re-register to make sure you don't get spurious events. At least, I have enough code doing this kind of stuff based on runing MIO under heavy load over lots of threads ...

On Wed, Feb 20, 2019 at 10:20 AM Alan Somers notifications@github.com wrote:

That would be another option. But are there any other legitimate use cases for using deregister? Does anybody ever decide to ignore a socket without closing it?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/carllerche/mio/issues/911#issuecomment-465695602, or mute the thread https://github.com/notifications/unsubscribe-auth/ABo0CwKUI_DODxGOkq2wbh-LM_0G_u0Zks5vPZHWgaJpZM4aPVMn .

carllerche commented 5 years ago

@asomers @przygienda well, ideally we could do something like remove deregister from the "portable" API but keep it exposed when you opt into platform specific behavior.

We probably need to make a list of the various inconsistencies across platforms and see how we can unify.

carllerche commented 5 years ago

So, Poll could have a deregister_fd on platforms that support it...

That said, the biggest problem with using oneshot as the portable strategy is that it adds additional epoll_ctl calls on linux. I wonder how much that would impact performance.

Thomasdezeeuw commented 5 years ago

I just tried this with mio_st and it worked, I'll try to figure what is different on mio.

Thomasdezeeuw commented 5 years ago

So mio_st actually cheats a bit and simply ignores ENOENT errors. I think I went with this approach because the result is the same, the fd is no longer registered. Would that be an acceptable solution to this?