zeromq / netmq

A 100% native C# implementation of ZeroMQ for .NET
Other
2.94k stars 742 forks source link

NetMQ Poller Crash #1025

Open SteveHarveyUK opened 2 years ago

SteveHarveyUK commented 2 years ago

Environment

NetMQ Version:    4.0.1.8
Operating System: Windows Server 2019 (Version 1809 (OS Build 17763.2028))
.NET Version:     netstandard2.0 library in use by net472 application

Additional information on usage:

NetMQ is being used for comms between a client/server setup for the distribution of market data. The following sockets are used:

Expected behaviour

NetMQPoller running without exception.

Actual behaviour

NetMQPoller crashing with an internal exception:

Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.ArgumentOutOfRangeException
   at System.ThrowHelper.ThrowArgumentOutOfRangeException(System.ExceptionArgument, System.ExceptionResource)
   at NetMQ.Core.Patterns.Utils.ArrayExtensions.Swap[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]](System.Collections.Generic.List`1<System.__Canon>, Int32, Int32)
   at NetMQ.Core.Patterns.Utils.Distribution.Activated(NetMQ.Core.Pipe)
   at NetMQ.Core.ZObject.ProcessCommand(NetMQ.Core.Command)
   at NetMQ.Core.SocketBase.ProcessCommands(Int32, Boolean, System.Threading.CancellationToken)
   at NetMQ.Core.SocketBase.GetSocketOption(NetMQ.Core.ZmqSocketOption)
   at NetMQ.NetMQSelector.Select(Item[], Int32, Int64)
   at NetMQ.NetMQPoller.RunPoller()
   at NetMQ.NetMQPoller.Run(System.Threading.SynchronizationContext)
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
   at System.Threading.ThreadHelper.ThreadStart()

Steps to reproduce the behaviour

Unfortunately, this appears to be happening at random periods, I'm hoping that someone will have an idea of how I can acquire further information regarding this exception. We have 16 instances of the server talking to 6 instance of the client but only one of the six clients appears to be seeing this problem. We've recently migrated these systems to AWS as hosted VMs and started seeing this issue. My guess is that it could be disconnection or connection recovery related.

Any ideas/comments gratefully received.

SteveHarveyUK commented 1 year ago

Further investigation appears to indicate that this is an issue with the SubscriberSocket. If I've followed the flow correctly it's failing while handling ZqmSocketOptions.Events in the Select and CommandType.ActivateWrite command.

Some how this is leading to an attempt to Swap to an invalid pipe.

SteveHarveyUK commented 1 year ago

It appears that this issue is mitigated by setting the highwater mark to 0 enough that the Subscribe socket doesn't need to invoke the Swap. I'd still argue that this is a bug.