zeromq / netmq

A 100% native C# implementation of ZeroMQ for .NET
Other
2.97k stars 745 forks source link

Exceptions from NetMQ causing application crashes #200

Closed exzachtly1 closed 9 years ago

exzachtly1 commented 9 years ago

Hello, We are using NetMQ (latest nuget package v3.3.0.11) in an ASP.NET WebAPI application to send messages using the Push/Pull pattern. We have been running this way with success for some time, but over the past week or so have noticed one of our environments beginning to have application crashes on a fairly regular interval. When these exceptions occur the application pool crashes and generates a dump file.

Exceptions are originating from the NetMQ assembly. The first (and more common) one shows in the windows event log as:

An unhandled exception occurred and the process was terminated.

Application ID: DefaultDomain

Process ID: 9896

Exception: System.Runtime.Serialization.SerializationException

Message: Type 'NetMQ.NetMQException' in Assembly 'NetMQ, Version=3.3.0.11, Culture=neutral, PublicKeyToken=a6decef4ddc58b3a' is not marked as serializable.

StackTrace:    at System.Runtime.Serialization.Formatters.Binary.WriteObjectInfo.InitSerialize(Object obj, ISurrogateSelector surrogateSelector, StreamingContext context, SerObjectInfoInit serObjectInfoInit, IFormatterConverter converter, ObjectWriter objectWriter, SerializationBinder binder)
   at System.Runtime.Serialization.Formatters.Binary.WriteObjectInfo.Serialize(Object obj, ISurrogateSelector surrogateSelector, StreamingContext context, SerObjectInfoInit serObjectInfoInit, IFormatterConverter converter, ObjectWriter objectWriter, SerializationBinder binder)
   at System.Runtime.Serialization.Formatters.Binary.ObjectWriter.Serialize(Object graph, Header[] inHeaders, __BinaryWriter serWriter, Boolean fCheck)
   at System.Runtime.Serialization.Formatters.Binary.BinaryFormatter.Serialize(Stream serializationStream, Object graph, Header[] headers, Boolean fCheck)
   at System.Runtime.Remoting.Channels.CrossAppDomainSerializer.SerializeObject(Object obj, MemoryStream stm)
   at System.AppDomain.Serialize(Object o)
   at System.AppDomain.MarshalObject(Object o)

And the second one I have seen is:

An unhandled exception occurred and the process was terminated.

Application ID: /LM/W3SVC/7/ROOT/publishing

Process ID: 12456

Exception: System.ArgumentException

Message: Offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of the source collection.

StackTrace:    at System.Buffer.BlockCopy(Array src, Int32 srcOffset, Array dst, Int32 dstOffset, Int32 count)
   at NetMQ.zmq.EncoderBase.GetData(ByteArraySegment& data, Int32& size, Int32& offset)
   at NetMQ.zmq.StreamEngine.OutEvent()
   at NetMQ.zmq.IOThread.InEvent()
   at NetMQ.zmq.Poller.Loop()
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.ThreadHelper.ThreadStart()

I have tried to catch these exceptions in the calling code, but the application crash occurs before they can be caught. I have yet to be able to nail down a reliable way to reproduce this bug, and it is only happening in one environment. This environment sends messages with greater frequency so it is opening/closing connections on a much more regular interval. I suspect that has something to do with it.

Here is the code we use to connect and send messages (reduced to just the important bits):

//_zmqContext is a private class scoped variable
_zmqContext = NetMQContext.Create();
_zmqPushSocket = _zmqContext.CreatePushSocket();
_zmqPushSocket.Options.Linger = TimeSpan.FromSeconds(1);
_zmqPushSocket.Connect(socketEndpoint);

//payload is a JSON string
_zmqPushSocket.Send(payload);

_zmqPushSocket.Dispose();
_zmqContext.Dispose();

Important notes about usage - in generic terms, clients are invoking a process via the WebAPI and the process executes in steps. Each step sends a message. We have a logger class that wraps all of ZMQ connection state and messaging code. It works as follows:

I'm not sure if this is a legitimate bug or if we are mis-using the library, but as I cannot get the inner exception from NetMQ (due to the serialization exception), then it is hard for me to trace where it is happening in order to troubleshoot.

Any thoughts are appreciated! Thanks!

somdoron commented 9 years ago

most of the stuff already solved, try to compile from https://github.com/somdoron/netmq.

Hopefully in the weekend I will release a new RC version.

exzachtly1 commented 9 years ago

Thank you somdoron, I appreciate the reply! We have temporarily switched to HTTP based messaging but would like to continue using NetMQ. I'll check the RC version next Monday and see how it goes.

From what I have shown you, do you see any misuse in our code? Are we handling connection state correctly? It seems like it happens when many connections are opening / closing in a short time frame. My thought was that maybe the original error was happening because we were mis-handling the connections. Or is this simply a known bug?

somdoron commented 9 years ago

it seems fine. the exception not being serializable was fixed some time ago, I think the argument exception was fixed as well.

somdoron commented 9 years ago

@exzachtly1 netmq v3.3.0.12 RC1 was released to nuget:https://www.nuget.org/packages/NetMQ/3.3.0.12-rc1.

Let me know if you still have issues

drewnoakes commented 9 years ago

305 fixes problems with the serialisation of NetMQException and its subclasses.

iloginov commented 9 years ago

Well, shoul this issue be closed?