sopel-irc / sopel-github

GitHub plugin for Sopel
Other
3 stars 13 forks source link

Error: "Address already in use" when Sopel reconnects with webhooks enabled #70

Closed dgw closed 3 years ago

dgw commented 4 years ago

We've been seeing OSError: [Errno 98] Address already in use on the "official" Sopel instance when something interrupts its ZNC connection. (Later, failure of the webhook server to bind at setup() during reconnection leads to AttributeError: 'NoneType' object has no attribute 'shutdown' the next time Sopel exits or reconnects, since this plugin's shutdown() routine doesn't correctly check if the server object is valid before trying to shut it down.)

Sopel reconnecting is very similar to what happens during stop-start or restart: It shutdown()s all plugins, waits a bit, then setup()s all the plugins again. The difference is, it's the same process running those setup()s again instead of a new one. But interestingly, it's not possible to trigger the same behavior by starting sopel, doing Ctrl-C, and immediately starting sopel again.

It seems like the listener socket (from Bottle) is getting stuck in TIME_WAIT for a few minutes because the local end initiated the closure (this is just part of how TCP works). When the whole process terminates, the leftover socket is probably freed by the OS—but not when the process stays running, i.e. during a reconnection.

I'm not quite done researching the flag SO_REUSEADDR and its possible (side-)effects or other considerations, but it might help. Probably would require a custom Bottle backend to set up the option.

HumorBaby commented 4 years ago

Oops... didn't see this issue before opening #72 :grimacing:

You bring up an interesting point I didn't really consider:

since this plugin's shutdown() routine doesn't correctly check if the server object is valid before trying to shut it down

While #72 fixes the issue of actually un-binding the socket during the plugin's shutdown(), it does't really address the point above… do you think it's worth including along side the fix in #72? I just wanted to point it out to bring it to the front of your/our attention as your review #72 (or decide to close in favor of other options, e.g., SO_REUSEADDR).

dgw commented 4 years ago

I'd call them "separate but related". Fixing things so that the server never fails to bind theoretically makes it unnecessary to check if the object is valid before trying to shut it down… but "never" is a strong word, and no matter how many checks there are, someone is going to break it somehow in the wild. It's just a matter of time.

So yeah, feel free to toss in an if server is not None (that's pseudocode!) check somewhere in the same PR, just in case it still fails to bind at some point. (Maybe later we can consider an automatic retry feature. Like at most n retries spaced m minutes apart, triggered if the server fails to bind at setup() time.)

server_close() is a solution I never discovered, because the chain of superclasses between Bottle and socketserver.BaseServer (where it's defined, AFAICT) is long and, uh, Twisted. (Sorry… but not really. 😁) It might work just fine on its own for the problem we discovered on the "official" Sopel instance, and you should feel free to pip install -e a modified version of this plugin to test it out since it runs on your infra. The GitHub notifications there are just "nice to have", since everyone with an actual interest in following development activity has GitHub email notifications turned on anyway.

In fact, I'd be perfectly happy if server_close() solves the problem on its own. There are some considerations for using SO_REUSEADDR that I'm not sure I've explored enough to fully understand the possible effects on this plugin's operation. And if I don't have to think about those at all, it leaves more time for working on the main bot code. 😉