meebey / SmartIrc4net

IRC C# Library
http://www.meebey.net/projects/smartirc4net/
Other
126 stars 52 forks source link

Issue Regarding Message Queueing and Ping Timeout #48

Closed swiftyspiffy closed 7 years ago

swiftyspiffy commented 7 years ago

Hello,

I use SmartIrc4net in a library I develop for communicating with Twitch and the chat system (IRC-based). There's been an issue where after a random amount of time (sometimes several hours, sometimes several days) the the SmartIrc4net implementation hangs up and times out. @Danatobob and I have been working to figure out if it's an issue with TwitchLib or SmartIrc4net, and we're fairly confident it is with SmartIrc4net.

Since neither of us are super fluent with the SmartIrc4net, we were hoping you might be able to shed some light on the possible issue. Here is the link to the issue: TwitchLib Issue #81

Using the built in logger in SmartIrc4net, @Danatobob was able to get a pretty good dump of events in IrcConnection object as seen starting at this comment and below: Comment with Debug

Extracted from his post, we get correct activity which would read something like this:

2016-11-28 00:21:38,899 [IdleWorkerThread (irc.chat.twitch.tv:6667)] DEBUG SOCKET - sent: "PING irc.chat.twitch.tv"
2016-11-28 00:21:39,112 [ReadThread (irc.chat.twitch.tv:6667)] DEBUG SOCKET - received: ":tmi.twitch.tv PONG tmi.twitch.tv :irc.chat.twitch.tv"
2016-11-28 00:21:39,112 [14] DEBUG QUEUE - read: ":tmi.twitch.tv PONG tmi.twitch.tv :irc.chat.twitch.tv"

however, after some time (not easy to consistently identify when/why the bad behavior starts), this will occur:

2016-11-28 00:25:38,901 [IdleWorkerThread (irc.chat.twitch.tv:6667)] DEBUG SOCKET - sent: "PING irc.chat.twitch.tv"
2016-11-28 00:25:39,113 [ReadThread (irc.chat.twitch.tv:6667)] DEBUG SOCKET - received: ":tmi.twitch.tv PONG tmi.twitch.tv :irc.chat.twitch.tv"
2016-11-28 00:28:03,216 [ReadThread (irc.chat.twitch.tv:6667)] DEBUG SOCKET - received: "PING :tmi.twitch.tv"
2016-11-28 00:31:38,903 [IdleWorkerThread (irc.chat.twitch.tv:6667)] WARN SOCKET - ping timeout, connection lost

Importantly, as noted by @Danatobob, DEBUG QUEUE doesn't react to PONG. He goes on to provide further error logs I have provided one here: danatobob_irc_errorlog_full.txt that will likely prove useful in determing the source of the error. The issue starts shortly before the repeated sending data failed, connection lost We would be very grateful for any insight on this issue.

Thanks

meebey commented 7 years ago

I agree so far on your argumentation that the received PING request by the IRCd is not reacted to. To me it sounds like the listen thread has died, which has the job to process the queue of received messages, then parse them and then call an event handler for it.

Looking at the TwitchLib code I noticed this line:

Task.Factory.StartNew(() => _client.Listen());

(https://github.com/swiftyspiffy/TwitchLib/blob/master/TwitchLib/TwitchClient.cs#L676)

The problem with this code is that a) I think it hides exceptions (depends on exception behavior inside Task.Factory) b) Listen() can return and never parse newly received messages.

So you could guard that created thread with a try/catch and also log if Listen() returned to see if a) or b) applies to this issue.

Danatobob commented 7 years ago

I'll try wrapping that in try-catch on my part, since I'm the one whose bot is dying. What is the correct reaction in case of a and b ? Should I just call Listen again ? Make it a loop perhaps then with try-catch inside ? EDIT:

Task.Factory.StartNew(() => { 
while(true) // when to exit ?
{ try { _client.Listen() }catch(Exception e) {} 
} 
});
meebey commented 7 years ago

Danatobob wrote:

What is the correct reaction in case of a and b ? Should I just call Listen again ?

That is difficult to answer, it depends. I believe one of the callback handlers is throwing an exception and at that point there is no guarantee you can simply continue by calling Listen() again.

Danatobob commented 7 years ago

Allright, then just to identify the exception I guess. I'll do that now and update my bot. Thanks.

meebey commented 7 years ago

I just noticed the TwitchLib library does not use the current source of SmartIrc4net but some unofficial and outdated nuget package: https://github.com/swiftyspiffy/TwitchLib/blob/master/TwitchLib/packages.config You should try it with Smartrc4net from master if you haven't already.

swiftyspiffy commented 7 years ago

You should try it with Smartrc4net from master if you haven't already.

Doing this now.

Thanks for your quick and informative responses.

Edit: Built and deployed, will report back with results if crash persists, or given a few days, if it appears fixed.

Danatobob commented 7 years ago

Allright, the bot didn't bug yet, but I think I have all the information we need now to solve this. This is a complete irc log since the bot launch. irc_errorlog3.txt

This is what I have in TwitchClient,Connected:

Task.Factory.StartNew(() => 
            {
                try
                {
                    logger.Debug("Starting listening or irc connection");
                    _client.Listen();
                    logger.Debug("Finished listening or irc connection");
                }
                catch(Exception ex)
                {
                    logger.Fatal("Listen failed", ex);
                }
            });

Note my typo ("listening OR") in debug messages in case you would search logs. Now, the logs show the following:

Now you devs should figure out who should fix what ;). As I understand it, Listen should return as soon as we are starting to reconnect. Also event handlers should not throw exceptions, but maybe the exception is because Listen is processing the same stuff 2 or more times. Maybe listen was not intended to be used like that (I mean Factory.StartNew() => Listen)? EDIT: my assumption is that bot bugs when Listen fails and there is no more other instances of Listen to continue with the workflow.

meebey commented 7 years ago

Oh I haven't noticed that Listen() is called from the OnConnected event, yes this is asking for troubles. The event will be raised after a reconnect, but Listen() must not be called 2 times in parallel.

Danatobob commented 7 years ago

Okay, but then what is the correct usage of the library ? When should we call Listen if not after we are notified about Connected state ? Also, Listen seems to finish on connection lost, before disconnect.

2016-11-30 10:37:38,777 [WriteThread (irc.chat.twitch.tv:6667)] WARN SOCKET - connection lost
2016-11-30 10:37:38,780 [WriteThread (irc.chat.twitch.tv:6667)] DEBUG SOCKET - WriteThread aborted
2016-11-30 10:37:38,787 [14] DEBUG TwitchLib.TwitchClient - Finished listening or irc connection
2016-11-30 10:37:38,789 [12] INFO CONNECTION - disconnected

So we do need to call it again after reconnecting. Connected event seemed like a good place for that. EDIT: I guess we need a variable for dedicated thread instance then so we can monitor if its alive and not start new one ?

EDIT: And my assumption about bot bugging was correct - It just stopped working after another (last?) Listen failed. Though I'm not sure if the event handler exception is due to 2 or more Listen running, since the last exception should not happen in this case.

Danatobob commented 7 years ago

I've remembered that I saw this earlier: Issue with disconnecting Could that issue or one of the referenced ones be the cause of Listen not finishing during disconnect (just in case - I dl-ed master sources a few days ago) ? I mean isn't Listen not finishing during disconnect a bug? Maybe my quote in the post above showing it does finish sometimes was just a coincidence ? And if Listen really hangs, should we not receive any messages if we continue using same thread, waiting for Listen to do something? I'm confused now. Can you please clarify that in this log the Listen method behaves as expected (at least until we launch another one in parallel):

2016-11-30 10:35:35,063 [IdleWorkerThread (irc.chat.twitch.tv:6667)] WARN SOCKET - ping timeout, connection lost
2016-11-30 10:36:05,066 [12] INFO CONNECTION - reconnecting...
2016-11-30 10:36:05,072 [12] INFO CONNECTION - disconnecting...
2016-11-30 10:36:05,075 [12] DEBUG Meebey.SmartIrc4net.IrcConnection+ReadThread - Stop()
2016-11-30 10:36:05,076 [12] DEBUG Meebey.SmartIrc4net.IrcConnection+ReadThread - Stop(): aborting thread...
2016-11-30 10:36:05,076 [12] DEBUG Meebey.SmartIrc4net.IrcConnection+ReadThread - Stop(): joining thread...
2016-11-30 10:37:36,744 [ReadThread (irc.chat.twitch.tv:6667)] WARN SOCKET - connection lost
2016-11-30 10:37:36,747 [ReadThread (irc.chat.twitch.tv:6667)] DEBUG SOCKET - ReadThread aborted
2016-11-30 10:37:36,748 [12] DEBUG Meebey.SmartIrc4net.IrcConnection+ReadThread - Stop(): closing reader...
2016-11-30 10:37:36,748 [12] DEBUG CONNECTION - Stopping WriteThread...
2016-11-30 10:37:36,748 [WriteThread (irc.chat.twitch.tv:6667)] WARN SOCKET - connection lost
2016-11-30 10:37:36,752 [WriteThread (irc.chat.twitch.tv:6667)] DEBUG SOCKET - WriteThread aborted
2016-11-30 10:37:36,760 [12] INFO CONNECTION - disconnected
2016-11-30 10:37:36,770 [12] INFO CONNECTION - connecting... (attempt: 1)
2016-11-30 10:37:38,254 [ReadThread (irc.chat.twitch.tv:6667)] DEBUG SOCKET - ReadThread started
2016-11-30 10:37:38,257 [WriteThread (irc.chat.twitch.tv:6667)] DEBUG SOCKET - WriteThread started
2016-11-30 10:37:38,259 [12] INFO CONNECTION - connected
2016-11-30 10:37:38,259 [IdleWorkerThread (irc.chat.twitch.tv:6667)] DEBUG SOCKET - IdleWorkerThread started
2016-11-30 10:37:38,266 [12] DEBUG SOCKET - sent: "PASS oauth:myoauth"
2016-11-30 10:37:38,266 [12] DEBUG SOCKET - sent: "NICK mytwitchacc"
2016-11-30 10:37:38,267 [12] DEBUG SOCKET - sent: "USER mytwitchacc 0 * :mytwitchacc"
2016-11-30 10:37:38,267 [14] DEBUG TwitchLib.TwitchClient - Starting listening or irc connection
2016-11-30 10:37:38,268 [12] INFO CONNECTION - reconnecting...
2016-11-30 10:37:38,273 [12] INFO CONNECTION - disconnecting...
2016-11-30 10:37:38,276 [IdleWorkerThread (irc.chat.twitch.tv:6667)] DEBUG SOCKET - IdleWorkerThread aborted
2016-11-30 10:37:38,286 [12] DEBUG Meebey.SmartIrc4net.IrcConnection+ReadThread - Stop()
2016-11-30 10:37:38,286 [12] DEBUG Meebey.SmartIrc4net.IrcConnection+ReadThread - Stop(): aborting thread...
2016-11-30 10:37:38,286 [12] DEBUG Meebey.SmartIrc4net.IrcConnection+ReadThread - Stop(): joining thread...
2016-11-30 10:37:38,773 [ReadThread (irc.chat.twitch.tv:6667)] WARN SOCKET - connection lost
2016-11-30 10:37:38,777 [ReadThread (irc.chat.twitch.tv:6667)] DEBUG SOCKET - ReadThread aborted
2016-11-30 10:37:38,777 [12] DEBUG Meebey.SmartIrc4net.IrcConnection+ReadThread - Stop(): closing reader...
2016-11-30 10:37:38,777 [12] DEBUG CONNECTION - Stopping WriteThread...
2016-11-30 10:37:38,777 [WriteThread (irc.chat.twitch.tv:6667)] WARN SOCKET - connection lost
2016-11-30 10:37:38,780 [WriteThread (irc.chat.twitch.tv:6667)] DEBUG SOCKET - WriteThread aborted
2016-11-30 10:37:38,787 [14] DEBUG TwitchLib.TwitchClient - Finished listening or irc connection

According to the full log, at 2016-11-30 10:35:35, we still have only one (initial) Listen method running so it should not be spoiled by another instances in parallel.

meebey commented 7 years ago

Listen() basically is supposed to only return if it cannot recover from an error condition. If AutoReconnect is enabled, Listen() will not return as the library will try to re-vive the connection. If all reconnect attempts failed, it will return.

Even more important is that any exception that Listen() will see is likely to be an exception of the event handlers registered by the user of the library.

The reason the user needs to call Listen() is that you can control which thread is raising all the callbacks back into your code. From the log above you can see it is a thread named "12" and later "14".

meebey commented 7 years ago

Here you can see how Smuxi is using the Listen() method: https://github.com/meebey/smuxi/blob/master/src/Engine-IRC/Protocols/Irc/IrcProtocolManager.cs#L2433

swiftyspiffy commented 7 years ago

Thanks for the heads up on this, sorry it's gone kind of quiet. The issue seems to be largely fixed, just trying to fix a slightly unrelated issue. I will go ahead and close this though.

Thanks again for your help!

meebey commented 7 years ago

@swiftyspiffy you are welcome. if you have any issues or questions, don't hesitant to ask.