zeromq / netmq

A 100% native C# implementation of ZeroMQ for .NET
Other
2.95k stars 744 forks source link

Unhandled exception in Socket.Send when resuming an Xamarin.iOS app #1032

Open follesoe opened 2 years ago

follesoe commented 2 years ago

Environment

NetMQ Version: 4.0.1.9
Operating System: iOS 15.6.1
.NET Version: Xamarin.iOS 15.12.0.2

Expected behaviour

The background thread should not crash and terminate the full app if the underlying socket is closed, such as when resuming an iOS app to the foreground.

Actual behaviour

We are experiencing issues where our app crashes and dies, typically after being suspended for a couple of minutes and then being brought to the foreground. The crash reports indicates that the issue is SIGABRT: The socket has been shut down, from an unhandled exception in the Signaler.Send method (which has no exception handling).

Steps to reproduce the behaviour

Hard to create an exact reproducible, but I have a stack trace that might shed some light on the issue:

Socket.Send (System.Byte[] buffer, System.Int32 offset, System.Int32 size, System.Net.Sockets.SocketFlags socketFlags)
Socket.Send (System.Byte[] buffer)
Signaler.Send ()
Mailbox.Send (NetMQ.Core.Command cmd)
Ctx.SendCommand (System.Int32 threadId, NetMQ.Core.Command command)
ZObject.SendCommand (NetMQ.Core.Command cmd)
ZObject.SendHiccup (NetMQ.Core.Pipe destination, System.Object pipe)
Pipe.Hiccup ()
SessionBase.Detached ()
SessionBase.Detach ()
StreamEngine.Error ()
StreamEngine.ProcessInput ()
StreamEngine.Handle (NetMQ.Core.Transports.StreamEngine+Action action, System.Net.Sockets.SocketError socketError, System.Int32 bytesTransferred)
StreamEngine.FeedAction (NetMQ.Core.Transports.StreamEngine+Action action, System.Net.Sockets.SocketError socketError, System.Int32 bytesTransferred)
StreamEngine.InCompleted (System.Net.Sockets.SocketError socketError, System.Int32 bytesTransferred)
IOObject.InCompleted (System.Net.Sockets.SocketError socketError, System.Int32 bytesTransferred)
Proactor.Loop ()
ThreadHelper.ThreadStart_Context (System.Object state)
ExecutionContext.RunInternal (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state, System.Boolean preserveSyncCtx)
ExecutionContext.Run (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state, System.Boolean preserveSyncCtx)
ExecutionContext.Run (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state)
ThreadHelper.ThreadStart ()

I don't understand the underlying code base of NetMQ well enough, but I basically it looks like the Pipe.Hiccup method is sending a command, which in terms ends up at the Signaler.Send method which has no exception handling (probably by design): https://github.com/zeromq/netmq/blob/master/src/NetMQ/Core/Utils/Signaler.cs#L103

Do you have any guidance on how to prevent this from happening, or what could be the root cause?

The application has three sockets in use (a PUB, a SUB, and a REQ/REP socket).

follesoe commented 2 years ago

Some further details as to where the error is occurring: https://github.com/zeromq/netmq/blob/master/src/NetMQ/Core/SessionBase.cs#L527

// For subscriber sockets we hiccup the inbound pipe, which will cause
// the socket object to resend all the subscriptions.
if (m_pipe != null && (m_options.SocketType == ZmqSocketType.Sub || m_options.SocketType == ZmqSocketType.Xsub))
    m_pipe.Hiccup();

Our app uses a Sub socket, so this is probably the code path leading to the Hiccup method call.

follesoe commented 2 years ago

Some further debugging, and I believe the cause of the issue is related to application lifecycle events on iOS and how processes can be put in the background and then suspended after ~5 minutes. At this point, the state of the application (the .NET objects, etc.) are maintained, but the process is terminated, and the underlying iOS socket is closed.

When the app is later activated from memory, the NetMQ objects are recreated, but the underlying socket is closed, resulting in the exception. For now, I think I am able to work around this by handling the different life-cycle events in the app, and explicitly closing the ZeroMQ sockets when the app is about to be deactivated.