kiwiirc / irc-framework

🛠️ A better IRC framework for node.js. For bots and full clients.
MIT License
183 stars 65 forks source link

On some irc-errors reconnecting shouldn't stop. #211

Open tokariu opened 5 years ago

tokariu commented 5 years ago

I've been using the lounge for some time now and ran into some reconnection issues. I've been told that this is the place to open an issue for that case, because reconnection is handled by this irc-framework. so here it is.

example:

I'm re-connecting to the irc network after my connection has been lost or dropped (connection reset by peer case again). So my old connection is still alive on the irc-server and it needs a few more minutes until it ping-timeouts. until then, the irc-server doesn't allow me to connect with the same ip-address, because it allows only one connection per ip-address.

when I try to connect, this is what happens:

[020] * Please wait while we process your connection.
mynick: Nickname is already in use.
Closing Link: mynick[unknown@ip.host.com] (Too many host connections (local)) (irc)
Disconnected from the network, and will not reconnect. Use /connect to reconnect again.

At this point The Lounge does not try to reconnect anymore. it stays offline indefinitely. But it should try reconnecting! After a few minutes the old connection drops out and I would be able to reconnect.

The initial irc connection/registration did not fail, I'm connected to the irc-network for a moment, but due to the old connecton I get kicked out. This is important to distinguish: my irc-client somehow only tries to reconnect if the initial connection attempt did not work. in this case however, it actually did work for a moment, but the irc-server decides to close the connection which in the end has the same effect as a connection failure. So I'll stay offline forever until someday I manually login to the lounge and type /connect. Needless to say that I expected to be able to read the irc/chan history when I came back, but TL wasn't connected, so there is nothing to read.

summa summarum: The lounge / the irc-framework needs to try reconnecting at an interval of like one to five minutes when the error is "too many host connections (*)"

myself248 commented 4 years ago

This bit me last night. Troubleshooting some other issues, I rebooted my router, which knocked my TL instance off Freenode. I knew that would happen, and I also knew it would just reconnect, no big deal. So I could go on about my evening and I knew my scrollback would be waiting for me.

Come back to my session later, and I see this in the server window:

14:26 Connection closed unexpectedly: Error: read ECONNRESET
14:26 Disconnected from the network. Reconnecting in 11 seconds… (Attempt 1 of 360)
14:27 Connected to the network.
14:27 -weber.freenode.net- *** Looking up your hostname...
14:27 -weber.freenode.net- *** Checking Ident
14:27 -weber.freenode.net- *** No Ident response
14:27 -weber.freenode.net- *** Found your hostname
14:27 myself: Nickname is already in use. An attempt to use it will be made when this nick quits.
14:28 Connection closed unexpectedly: Error: read ECONNRESET
14:28 Disconnected from the network, and will not reconnect. Use /connect to reconnect again.

Welp, that proved me wrong -- it didn't reconnect, and I missed a bunch of scrollback.

TheLounge says this is an irc-framework issue: https://github.com/thelounge/thelounge/issues/3419 so I'm adding my comments here. If that was attempt 1 of 360, what happened to the other 358?

prawnsalad commented 4 years ago

@tokariu the error 'Too many host connections' means that the IRC server won't allow any more connections from your host. So irc-framework won't reconnect.

@myself248 if you or someone could provide an irc-framework script that shows the error then that would be useful. I don't know how thelounge handles irc-framework events or any middlewares or what, so the bare minimum to show the bug will be good there.

xPaw commented 4 years ago

The issue boils down to the fact that reconnections are only attempted if previous connection was registered.

In both of these examples, it goes like this:

Perhaps we should have a counter to allow a few reconnection attemps if it gets errors? This would warrant adding randomization to the next connect (and exponential backoff would also help, #140)

rburchell commented 4 years ago

'Too many host connections' means that the IRC server won't allow any more connections from your host.

It's worth considering that this problem might be a transient one. Consider the case where there's a limit of 5 connections from an IP address, and 5 people are already connected from it. Their network drops (temporarily), and they all disconnect -- but the server hasn't yet noticed this. They attempt to reconnect, but the server drops their connection since the old connections are still established. After the old connections drop due to PING, establishing new connections will work OK.

Another case where this appears to cause problems (which I hit personally, and brought me to this issue):

Closing link: (xxxx) [Registration timeout] (irc)

In this case, transient networking problems during the initial registration meant that the server didn't get my registration commands fast enough, so it closed the connection. This isn't a permanent problem, though, and would have worked (eventually).


For what it's worth, my $0.02:

Transient errors of all sorts can happen all the time (DNS failures, network failures, ...) and the vast majority of the time, these errors will be fixed, and the library user generally has no possible way to impact this, so retrying is the correct thing to do.

If a connection might not be established in a useful way, by all means, let the user of the framework decide to stop attempting to reconnect if their usage says that is the right thing to do, but don't do it for them.

ssbarnea commented 4 years ago

I got hit by this bug twice in a week and I totally agree with @rburchell regarding the fact that the client must decide when to abandon the retry loop.

07:35 Ping timeout, disconnecting…
07:35 Disconnected from the network. Reconnecting in 5 seconds… (Attempt 1)
07:35 *** Connection closed unexpectedly: Error: getaddrinfo EAI_AGAIN chat.freenode.net
07:35 Disconnected from the network. Reconnecting in 8 seconds… (Attempt 2)
07:36 *** Connection closed unexpectedly: Error: getaddrinfo EAI_AGAIN chat.freenode.net
07:36 Disconnected from the network. Reconnecting in 6 seconds… (Attempt 3)
07:36 *** Connection closed unexpectedly: Error: getaddrinfo EAI_AGAIN chat.freenode.net
07:36 Disconnected from the network. Reconnecting in 11 seconds… (Attempt 4)
07:36 Connected to the network.
07:36 -card.freenode.net- *** Looking up your hostname...
07:36 -card.freenode.net- *** Checking Ident
07:36 -card.freenode.net- *** Found your hostname
07:38 *** Connection closed unexpectedly: Error: read ECONNRESET
07:38 Disconnected from the network, and will not reconnect. Use /connect to reconnect again.

Mainly a permanent failure after less than 3 minutes, it was one of those WTF! moments.

mornfall commented 4 years ago

As a possible hotfix (sorry, I don't have the patch; if you tell me how to get one from hand-edited node_modules, pray tell), in src/connection.js:

The first change makes sure that if the initial connection fails entirely, we retry. The second makes sure that if the reconnection dies after the socket connects but before IRC registers, we retry too.

nanaya commented 2 years ago

What's the conclusion on this? Is this the expected behavior? Related issue on thelounge has been closed citing it's upstream issue (this issue) but if it's actually expected then thelounge may need to be updated to handle this case (if they agree reconnection should be attempted).

This makes rebooting server (or having trouble and the server resets) a rather risky proposition especially when hosting multiple people which number close to the irc server limit. There's good chance some users don't automatically reconnect and they must be notified to login and reconnect if they don't want to lost logs.

mornfall commented 2 years ago

Bump? I have been running thelounge with the fix suggested 2 comments above for almost 2 years, without running into problems. Hosting about a hundred users.

tokariu commented 2 years ago

I just tried @mornfall 's suggestion from above for a few days and it seems to work just fine so far. This should probably be merged.

also worth mentioning: ...

* move `this.reconnect_attempts = 0` from `socketOpen` to `registeredSuccessfully`

...

I couldn't find that line in the socketOpen method. I think we should be looking for that.reconnect_attempts = 0 from socketOpen and move it as this.reconnect_attempts = 0 to the registeredSuccessfully method.

MyWay commented 3 months ago

Any news on this?