Closed ioquatix closed 1 year ago
@Math2 this is kind of disappointing but maybe unavoidable.
(https://stackoverflow.com/a/26547388)
Predictably, Linux does the same thing as it does for select(2). From the manual page:
For a discussion of what may happen if a file descriptor in an epoll instance being monitored by epoll_wait() is closed in another thread, see select(2).
And from the select(2) page:
If a file descriptor being monitored by select() is closed in another thread, the result is unspecified. [...] On Linux (and some other systems), closing the file descriptor in another thread has no effect on select()
The tl;dr; is "don't do it":
In summary, any application that relies on a particular behavior in this scenario must be considered buggy.
I have been considering introducing a io_close
hook in the scheduler. Such a hook might get us 90% of the way towards avoiding this PR. In other words, we can get notified when someone calls IO#close
.
Explicitly deregister affected file descriptors from epoll set before calling dup/dup2/dup3 or close.
https://idea.popcount.org/2017-02-20-epoll-is-fundamentally-broken-12/
Yeah... seems like there's no other way. As far as I can tell. It's either one-shot-like design, or the selector needs to know about close()s. I don't see how else it could do the right thing otherwise. The selector just wouldn't have enough information.
Most of those event mechanisms, apparently they were designed with application-specific event loops in mind. Not something generic that can be shared by different modules in the same process. That's forcing to design the selector as more of a layer that interposes itself between the event mechanism and the user code. Just so that it can know how FDs are being managed and do what the mechanism expects. It would have been so much better if the mechanisms themselves were more "FD-oriented" instead (because that's what user code actually deals with, not the underlying kernel objects).
Anyway, if there's an #io_close
hook, maybe there should be an #io_dup
too (or #io_new_from
, or #io_alias
?). Eventually the selector could try to make things work with FDs that alias each others, but it would already be pretty good to have it warn you if it detects that you're doing something unsafe.
If the selector could know for sure when an IO operation yielded an EWOULDBLOCK
, it probably could use edge-triggered epoll mode safely too (and refuse syscalls even further).
I slept on it and now I have a better idea, I'll make a PR to see if it's viable.
What do you think of 2852d4f?
There is one more idea I had: when all waiting fibers are cancelled due to IO#close
, epoll_descriptor->waiting_events
would become 0. We might be able to take advantage of that knowledge. There should be a difference between waiting_events
dropping to zero because the operation completed successfully vs the operation being cancelled (due to IOError
which is raised in operations which are cancelled by IO#close
).
What do you think of 2852d4f?
Nice. Good idea. I hadn't thought of that. We can assume that the FD has been closed and re-opened if we get passed a different IO object for the same FD...
I did some stress testing and it seems to work fine.
Can IO_Event_Selector_EPoll_Descriptor structs be traversed by the GC? I think it would have to be to be 100% correct. But, the odds of the exact same IO object address getting recycled at exactly the wrong time must be pretty damn small.
There is one more idea I had: when all waiting fibers are cancelled due to
IO#close
,epoll_descriptor->waiting_events
would become 0. We might be able to take advantage of that knowledge. There should be a difference betweenwaiting_events
dropping to zero because the operation completed successfully vs the operation being cancelled (due toIOError
which is raised in operations which are cancelled byIO#close
).
That would be for when IO is done through the selector with #io_read
/#io_write
so that the selector can see the exceptions right? I wonder if it's worth it to optimize the error cases, but it sounds like it would work.
You are correct, if we are going to use it, we should retain it via GC. We have to carefully consider when it goes out of scope, but that should be okay.
https://github.com/socketry/io-event/pull/69 introduced a failing test case.
File descriptors may be reused. Caching based on descriptor can therefore cause missing registrations.
Types of Changes
Contribution