jaraco / irc

Full-featured Python IRC library for Python.
MIT License
392 stars 87 forks source link

Reactor.sockets sometimes returns sockets with fileno < 0 #91

Closed mkataja closed 8 years ago

mkataja commented 8 years ago

The sockets property sometimes returns sockets that have fileno < 0, which causes an error down the line. Since this doesn't happen all that often, I've not been able to debug the issue and its cause. I originally found the issue more than a year ago, so it's possible it has been patched since, but I didn't find anything relevant in the changelogs.

While I've successfully monkey-patched this in my client by only returning sockets with a non-negative fileno, the root cause obviously lies somewhere else (closed connections not getting cleaned up for some reason?).

My trivial patch for reference:

def patch_client_reactor_sockets():
    sockets_orig = irc.client.Reactor.sockets
    @property
    def sockets_new(self):
        with self.mutex:
            return [
                    socket
                    for socket in sockets_orig.fget(self)
                    if socket.fileno() >= 0
            ]
    irc.client.Reactor.sockets = sockets_new
jaraco commented 8 years ago

This sounds like an error on the BSD sockets implementation. What is the error that happens down the line? How do other socket-based libraries deal with this situation?

jaraco commented 8 years ago

I'm closing this for now, but feel free to revive the discussion .

mkataja commented 7 years ago

I could finally reproduce this. I'm running commands in threads and seems like I managed to create a race condition by calling jump_server on SingleServerIRCBot that way (I simply hadn't paid attention to whether SingleServerIRCBot is thread-aware or not). I suspect the socket might be in the process of closing (due to jump_server) in a worker thread while the main thread enters process_once and sees that socket while it's in a bad state.

Unfortunately I can't tell if this is the full story since I don't have logs from when I originally encountered this issue.