zeromq / netmq

A 100% native C# implementation of ZeroMQ for .NET
Other
2.95k stars 744 forks source link

Client Crash when server goes down and up again #778

Closed ciottied closed 4 years ago

ciottied commented 5 years ago

Environment

NetMQ Version:    4.0.0.1
Operating System: Win Server 2012
.NET Version:   Net framework 4.7.2

Expected behaviour

Crashing app is a client connected to a (stock market) Feedserver by the means of a dealer socket (for async subscribe/unsubscribe and other requests) and a sub socket (prices streams. If Server, for whatever reason, goes down and then up again app should reconnect and re-subscribe prices (through dealer socket).

Actual behaviour

App Crashes, even in VS debug mode, with this exception:

System.ArgumentOutOfRangeException was unhandled
Message: An unhandled exception of type 'System.ArgumentOutOfRangeException' occurred in mscorlib.dll
Additional information: Index was out of range. Must be non-negative and less than the size of the collection. 

Unexpectedly, ex does not have a stack trace. Event logger logs this net framework error :

Application: SunFeedHistServer.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.ArgumentOutOfRangeException
   at System.ThrowHelper.ThrowArgumentOutOfRangeException(System.ExceptionArgument, System.ExceptionResource)
   at NetMQ.Core.Patterns.Utils.ArrayExtensions.Swap[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]](System.Collections.Generic.List`1<System.__Canon>, Int32, Int32)
   at NetMQ.Core.Patterns.Utils.Distribution.Activated(NetMQ.Core.Pipe)
   at NetMQ.Core.Patterns.XSub.XWriteActivated(NetMQ.Core.Pipe)
   at NetMQ.Core.SocketBase.WriteActivated(NetMQ.Core.Pipe)
   at NetMQ.Core.Pipe.ProcessActivateWrite(Int64)
   at NetMQ.Core.ZObject.ProcessCommand(NetMQ.Core.Command)
   at NetMQ.Core.SocketBase.ProcessCommands(Int32, Boolean)
   at NetMQ.Core.SocketBase.GetSocketOption(NetMQ.Core.ZmqSocketOption)
   at NetMQ.NetMQSelector.Select(Item[], Int32, Int64)
   at NetMQ.NetMQPoller.Run()
   at System.Threading.ThreadHelper.ThreadStart_Context(System.Object)
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
   at System.Threading.ThreadHelper.ThreadStart()

Steps to reproduce the behaviour

Can send code snippet

mrnr1 commented 5 years ago

I would need to see your project to determine exactly why the exception is being thrown, however I presume it's because you're making an assumption on an operation that you're still connected to the host when your not. In my application, I have an EventHandler 'DisconnectedEvent' that I invoke when bIsConnected is true, but the receiveTimer - lastReceived (elapsed milliseconds since the last message from the server) is greater than my threshold for a disconnection event (10 seconds). This is controlled via a sequence of message objectts sent to and received from the host (ClientPingMsg/ServerPingMsg), so when I receive a message from the server I update my lastReceieved variable. Then, in my handler for the disconnection event, I attempt to reinitialize the connection.

For example, not in any kind of order:


private readonly Stopwatch _receiveTimer;
private long _lastReceived = -1;
public event EventHandler DisconnectEvent;
private const int
            DisconnectionTimeout = 10 * 1000; // in seconds until we're considered disconnected due to timeout

// In your CTOR / after connection is established:
_receiveTimer = Stopwatch.StartNew();

private void OnDisconnectionEvent()
{
    DisconnectEvent?.Invoke(this, new EventArgs());
}

// In your SendQueue Message processor (i.e Send())
if (_isConnected
                && _receiveTimer.ElapsedMilliseconds - _lastReceived > DisconnectionTimeout)
            {
                Disconnected();    // Set bIsConnected to false, clear send queue, re-initialize, re-subscribe, etc.
            }

// Called when we have received a ping message from the host, update your last received variable.
private void UpdateLastReceived()
        {
            _lastReceived = _receiveTimer.ElapsedMilliseconds;
        }
ciottied commented 5 years ago

I included NetMQ source in my project and I gathered more info. The culprit is Sub Socket reconnecting : it runs new synchronization context NetMQ\NetMQPoller.cs: 417 and down to the stack trace NetMQ\Core\Patterns\Utils\ArrayExtensions.cs: 52 tries to swap the elements of the items list. Items list has just one element items[0]=NetMQ.Core.Pipe[NetMQ.Core.Patterns.Sub[1]] , but swap is requested between index1=0 and index2=1 .

So the app crashes calling T item2 = items[index2];, because items[1] is out of range. This is the routine crashing:

public static void Swap<T>([NotNull] this List<T> items, int index1, int index2) where T : class
        {
            if (index1 == index2) 
                return;

            T item1 = items[index1];
            T item2 = items[index2];
            if (item1 != null)
                items[index2] = item1;
            if (item2 != null)
                items[index1] = item2;
        }

...and this is the complete stacktrace:

in System.ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument argument, ExceptionResource resource)
   in System.Collections.Generic.List`1.get_Item(Int32 index)
   in NetMQ.Core.Patterns.Utils.ArrayExtensions.Swap[T](List`1 items, Int32 index1, Int32 index2) in C:\Users\edoar\source\repos\CURRENT\NetMQ\Core\Patterns\Utils\ArrayExtensions.cs:riga 52
   in NetMQ.Core.Patterns.Utils.Distribution.Activated(Pipe pipe) in C:\Users\edoar\source\repos\CURRENT\NetMQ\Core\Patterns\Utils\Distribution.cs:riga 147
   in NetMQ.Core.Patterns.XSub.XWriteActivated(Pipe pipe) in C:\Users\edoar\source\repos\CURRENT\NetMQ\Core\Patterns\XSub.cs:riga 150
   in NetMQ.Core.SocketBase.WriteActivated(Pipe pipe) in C:\Users\edoar\source\repos\CURRENT\NetMQ\Core\SocketBase.cs:riga 1284
   in NetMQ.Core.Pipe.ProcessActivateWrite(Int64 msgsRead) in C:\Users\edoar\source\repos\CURRENT\NetMQ\Core\Pipe.cs:riga 388
   in NetMQ.Core.ZObject.ProcessCommand(Command cmd) in C:\Users\edoar\source\repos\CURRENT\NetMQ\Core\ZObject.cs:riga 265
   in NetMQ.Core.SocketBase.ProcessCommands(Int32 timeout, Boolean throttle) in C:\Users\edoar\source\repos\CURRENT\NetMQ\Core\SocketBase.cs:riga 1061
   in NetMQ.Core.SocketBase.GetSocketOption(ZmqSocketOption option) in C:\Users\edoar\source\repos\CURRENT\NetMQ\Core\SocketBase.cs:riga 332
   in NetMQ.NetMQSocket.GetSocketOption(ZmqSocketOption option) in C:\Users\edoar\source\repos\CURRENT\NetMQ\NetMQSocket.cs:riga 432
   in NetMQ.NetMQSelector.Select(Item[] items, Int32 itemsCount, Int64 timeout) in C:\Users\edoar\source\repos\CURRENT\NetMQ\NetMQSelector.cs:riga 146
   in NetMQ.NetMQPoller.RunPoller() in C:\Users\edoar\source\repos\CURRENT\NetMQ\NetMQPoller.cs:riga 483
   in NetMQ.NetMQPoller.Run(SynchronizationContext syncContext) in C:\Users\edoar\source\repos\CURRENT\NetMQ\NetMQPoller.cs:riga 417
   in NetMQ.NetMQPoller.Run() in C:\Users\edoar\source\repos\CURRENT\NetMQ\NetMQPoller.cs:riga 392
   in System.Threading.ThreadHelper.ThreadStart_Context(Object state)
   in System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   in System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   in System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
   in System.Threading.ThreadHelper.ThreadStart()
stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had activity for 365 days. It will be closed if no further activity occurs within 56 days. Thank you for your contributions.

obiben commented 2 years ago

This happens to me exactly as described. @ciottied did you find a solution?