reactiflux / discord-irc

Connects Discord and IRC channels by sending messages back and forth.
MIT License
1.2k stars 293 forks source link

IRC disconnects but never re-connects (freenode) #336

Open Someguy123 opened 6 years ago

Someguy123 commented 6 years ago

Node version: v6.11.4 Discord IRC Version: 2.4.2 (from npm)

Discord IRC seems to randomly disconnect from IRC, and fail to re-connect at any point. I have to keep going in and restarting it. It stays connected to Discord just fine, but not IRC.

17 Nov 21:23:39 - SEND: PRIVMSG #litecoin :<07redacted> some example message
17 Nov 21:23:39 - Network error: Error: This socket is closed
plwalters commented 6 years ago

:+1:

Throne3d commented 6 years ago

What version are you running? (discord-irc -V) Also are you running it in debug mode ("debug": true inside ircOptions, NODE_ENV=development as an environment variable), and if so does it show any output that might explain the issue?

@PWKad: I'm not sure what you're trying to indicate with a thumbs up, or why you did it as a separate message – GitHub has a reactions thing you can use to put a thumbs up on the message itself, or if you're also encountering the issue you could explain that (and hopefully add some more details to help track down the issue)?

Someguy123 commented 6 years ago

Looks like I'm actually on 2.4.2, not sure if there was a fix for this issue in 2.5.0 but I can update and test.

plwalters commented 6 years ago

I'm on my phone and reactions don't show up here. Sorry for the double notification

Throne3d commented 6 years ago

There probably was not, but the upstream IRC library had a lot of updates, especially around things testing its core functionality, so it's possible it did. (I don't recall all the changes I made.)

Someguy123 commented 6 years ago

Seems that 2.5.0 actually broke the bot completely for me...

$ discord-irc --config config.json
info: Connected to Discord
info: Connected to IRC
18 Nov 23:41:27 - ERROR: { prefix: 'verne.freenode.net',
  server: 'verne.freenode.net',
  command: 'err_cannotsendtochan',
  rawCommand: '404',
  commandType: 'error',
  args: [ 'SGDiscordBot', '#litecoin-dev', 'Cannot send to channel' ] }
error: Received error event from IRC prefix=verne.freenode.net, server=verne.freenode.net, command=err_cannotsendtochan, rawCommand=404, commandType=error, args=[SGDiscordBot, #litecoin-dev, Cannot send to channel]
18 Nov 23:41:27 - ERROR: Unhandled message: { prefix: 'verne.freenode.net',
  server: 'verne.freenode.net',
  command: '671',
  rawCommand: '671',
  commandType: 'normal',
  args:
   [ 'SGDiscordBot',
     'SGDiscordBot',
     'is using a secure connection' ] }
18 Nov 23:41:27 - ERROR: Unhandled message: { prefix: 'services.',
  server: 'services.',
  command: '328',
  rawCommand: '328',
  commandType: 'normal',
  args: [ 'SGDiscordBot', '#litecoin', 'https://litecoin.org' ] }
Someguy123 commented 6 years ago

Update: false alarm. it's just not printing as verbosely as it did in 2.4

Throne3d commented 6 years ago

Right – 2.4 had a bug (again with the upstream IRC library) that accidentally put it into permanent debug mode, and it's probably no longer doing that. If you put it in debug mode ("debug": true inside ircOptions, NODE_ENV=development as an environment variable), instead of it just being there through a glitch, then this time you might get more out of it if/when it fails. (I'm not sure.)

scribblemaniac commented 6 years ago

I experienced a similar issue with version 2.5.0 installed from npm (with Node version 8.9.2). I've set the recommended debug options and will report back if it happens to me again.

scribblemaniac commented 6 years ago

Well, this issue has finally happened again, and this time with the recommended debug options enabled 🎉 . Here is the console output (the configuration errors were from earlier runs), configuration and the output log (sensitive information removed where necessary). Based on the log, it seems that the IRC stopped getting pinged around January 29th, and reconnected multiple times the day before. I hope this helps!

Throne3d commented 6 years ago

@scribblemaniac It ended up reconnecting 10 times, then stopped, which is part of the standard configuration. If you want to stop that, you could change it in the ircOptions – it seems kinda unintuitive to set it to that, to me, but it's currently that in the code. (Adding "retryCount": null inside "ircOptions": { … } in your config should work.)

Separately, the logs seem to be outputting pings twice, and it looks like it's connecting twice. Are you running the bot with two separate configurations, or running it twice, or is this erroneous? There were issues in the past with it connecting twice even from just one bot, is why I ask.

scribblemaniac commented 6 years ago

It ended up reconnecting 10 times, then stopped, which is part of the standard configuration.

I will add the workaround you suggested, but issue still remains: why it's disconnecting/reconnecting in the first place? It's not a network issue on my side because the connection to the Discord server is fine even while there are issues with the IRC connection. Freenode wasn't down either or both connections would've failed at the same time.

Separately, the logs seem to be outputting pings twice, and it looks like it's connecting twice. Are you running the bot with two separate configurations, or running it twice, or is this erroneous?

In the configuration you'll see that I have it listening to two different IRC channels, so this is expected behavior I imagine.

Throne3d commented 6 years ago

In the configuration you'll see that I have it listening to two different IRC channels, so this is expected behavior I imagine.

Oh, I completely forgot to check the configuration. Yes, you have two separate bots there, so that's probably expected.

I will add the workaround you suggested, but issue still remains: why it's disconnecting/reconnecting in the first place? It's not a network issue on my side because the connection to the Discord server is fine even while there are issues with the IRC connection. Freenode wasn't down either or both connections would've failed at the same time.

Are you sure the connection to the Discord server is fine? The IRC side has a pretty short ping cycle and timeout (I think 15 seconds and then 8 seconds after that, respectively), whereas I think Discord takes much more of a connection break to need to reconnect. (If, for example, I run my bot and then disconnect my internet connection, the IRC side will notify me that it's trying to reconnect but Discord offers no such output, even after I reconnect the connection. This might only hold for brief disconnections.)

It might be that the network connection is dropping briefly, but the IRC side of things responds to this with a shorter timeout. You could increase millisecondsOfSilenceBeforePingSent (15,000 by default) or millisecondsBeforePingTimeout (8,000 by default), in ircOptions, to see if that affects things?

scribblemaniac commented 6 years ago

You're right, I suppose there is a window of time where the IRC could fail but a Discord heartbeat would not be sent. The high latency (5352ms) in the Discord heartbeat after the IRC fail also points to a network issue.

However, looking at the logs, it looks different when I purposefully disconnect from the network than when this issue occurs. Consider the last IRC messages:

29 Jan 00:08:35 - CyclingPingTimer 21: server silent for too long, let's send a PING
29 Jan 00:08:35 - SEND: PING 9194
29 Jan 00:08:35 - Received: :barjavel.freenode.net PONG barjavel.freenode.net :9194
29 Jan 00:08:50 - CyclingPingTimer 21: server silent for too long, let's send a PING
29 Jan 00:08:50 - SEND: PING 9195
29 Jan 00:08:58 - CyclingPingTimer 21: ping timer stopped
29 Jan 00:08:58 - CyclingPingTimer 21: ping timeout!
29 Jan 00:08:58 - Connection got "close" event
29 Jan 00:08:58 - Disconnected: reconnecting
29 Jan 00:08:58 - Maximum retry count (10) reached. Aborting

The note that there is a successful IRC ping sent at 00:08:35, then one timeout fail at 00:08:58, and then the connection is aborted. When I disconnect manually, it looks more like this:

9 Feb 13:13:59 - CyclingPingTimer 1: server silent for too long, let's send a PING
9 Feb 13:13:59 - SEND: PING 8287
9 Feb 13:14:04 - CyclingPingTimer 0: ping timer stopped
9 Feb 13:14:04 - CyclingPingTimer 0: ping timeout!
9 Feb 13:14:04 - Connection got "close" event
9 Feb 13:14:04 - Disconnected: reconnecting
9 Feb 13:14:04 - Waiting 2000ms before retrying
9 Feb 13:14:06 - Attempting socket connection to IRC server
9 Feb 13:14:06 - Network error: Error: getaddrinfo ENOTFOUND chat.freenode.net chat.freenode.net:6667
9 Feb 13:14:06 - Connection got "close" event
9 Feb 13:14:06 - Disconnected: reconnecting
9 Feb 13:14:06 - Waiting 2000ms before retrying
.
. (a bunch more attempts to reconnect)
.
9 Feb 13:14:27 - Attempting socket connection to IRC server
9 Feb 13:14:27 - Network error: Error: getaddrinfo ENOTFOUND chat.freenode.net chat.freenode.net:6667
9 Feb 13:14:27 - Connection got "close" event
9 Feb 13:14:27 - Disconnected: reconnecting
9 Feb 13:14:27 - Maximum retry count (10) reached. Aborting

Is there a counter for IRC ping fails that is perhaps not being reset? Because it doesn't look like it has actually retried 10 times. It's also a little bit suspicious that the two IRC connections do not fail at the same time: the first one fails at 28 Jan 23:46:23, and the second one 26 minutes later at 29 Jan 00:08:58. If my network or freenode was down, wouldn't they both fail at the same time?

Throne3d commented 6 years ago

This complicates things, but is still explainable without it being a bug, I think.

Is there a counter for IRC ping fails that is perhaps not being reset? Because it doesn't look like it has actually retried 10 times.

It does seem to have retried approximately 10 times, actually, for each. The number after "CyclingPingTimer" in the logs is a number that counts up for each connection. As it's at 21, it seems both instances of the bot have tried reconnecting about 10 times (these numbers start at 0 and 1, because there are two connections, and then if the bots both reconnect 10 times sequentially, these increase to 20 and 21). As far as I know, that only increases when it reconnects, so it seems the connection has reconnected ten times, per instance, over the entire course of the bot's running. This might not be expected behavior, but I think it's intended behavior. (I'm not sure, I didn't write the upstream IRC library originally.)

However, looking at the logs, it looks different when I purposefully disconnect from the network than when this issue occurs.

The error in the logs is due to an ENOTFOUND error, which I believe only occurs if it can't look up the DNS of the server when it tries to connect. This therefore doesn't occur if freenode is down, or if your DNS can resolve when the bot tries to reconnect; it does occur if your network is down. I'm not completely sure, but I think it doesn't suggest anything wrong with the behavior.

It's also a little bit suspicious that the two IRC connections do not fail at the same time: the first one fails at 28 Jan 23:46:23, and the second one 26 minutes later at 29 Jan 00:08:58. If my network or freenode was down, wouldn't they both fail at the same time?

I don't think so. The bots reset their ping timer upon receiving messages from the server, so if they connect at different times, or take differing lengths to get ping responses or keep track of different messages in different channels, the timers could get out of sync and the network could go down when one is looking for a ping and return before the other does. That is, the following could happen:

00:00: Bot 1 gets a message from the server
00:13: Bot 2 gets a message from the server
00:14: -- Network connection breaks --
00:15: Bot 1 requests a ping, as it's been 15 seconds since its last message
00:23: Bot 1 times out, as it's been 8 seconds since it pinged
00:24: -- Network connection returns --
00:28: Bot 2 requests a ping, as it's been 15 seconds since its last message
00:28: Bot 2 receives a pong, and so doesn't time out

Due to them being out of sync, and the ping cycle and ping timeout times being different, the network could have gone down just long enough to cause a timeout on one while the other bot doesn't even notice a problem. Increasing these values should help stop this problem.

scribblemaniac commented 6 years ago

Okay, everything does seem to be working as intended. I don't necessarily agree with the design of retryCount, but at any rate I can see now that that would be an upstream issue. Thank you very much @Throne3d for taking the time to address all of my concerns, I hope I was not too much of a bother 😞 . For now I've set retryCount to null and I'll see how that goes. 👍

Throne3d commented 6 years ago

That's fine! I agree, the behavior of retryCount is confusing, and I might look into changing it – I was under the impression that was specifically not how it worked. We might want to remove the config default from discord-irc, at least, since it's weird for a bot that should have a good uptime and stay online for very long periods of time, so people are likely to encounter this issue with normal use.

PF94 commented 3 years ago

this issue persists 3 years later

on a irc relay bot, it somehow disconnected from the irc and i had to manually restart it.

i am not going to login to a server everyday just to restart some random bot that's going to fail 12 hours later.

Throne3d commented 3 years ago

@chaziz94 The comments above explain how you might work around this – you can set "ircOptions": {"retryCount": null} inside your configuration :^)

If that doesn't work, you could always come back with some logs of the issue (or if possible a repro), so I could look into solving it, rather than complaining about it remaining unsolved :P

ghost commented 3 years ago

Had same issue (disconnects every 10+ minutes and reconnects only 10 times). Suggested settings seem to solve the problem, I used these in [code]ircOptions[/code] section: [code]"millisecondsOfSilenceBeforePingSent": 20000, "millisecondsBeforePingTimeout": 30000, "retryCount": null,[/code]

PF94 commented 3 years ago

Had same issue (disconnects every 10+ minutes and reconnects only 10 times). Suggested settings seem to solve the problem, I used these in [code]ircOptions[/code] section: [code]"millisecondsOfSilenceBeforePingSent": 20000, "millisecondsBeforePingTimeout": 30000, "retryCount": null,[/code]

github doesn't support bbcode

WASDi commented 3 years ago

I was confused because the default retryCount is already null according to irc-upd, the underlying IRC library used. However, discord-irc overrides this default to 10, causing an irrecoverable disconnect as it gives up reconnecting if it loses the network for half a minute or so. Adding "retryCount": null indeed fixes this. The parameter retryDelay also defines how long to wait between retries, defaulting to 2000ms.