zeromq / netmq

A 100% native C# implementation of ZeroMQ for .NET
Other
2.97k stars 745 forks source link

Rare hanging on network blips #383

Closed ashic closed 9 years ago

ashic commented 9 years ago

I've got some services running on Windows. Usually they're all fine. In case of sustained multiple consecutive failures, the services are configured to shutdown (so an external supervisor can kick them). However, Disposing contexts seem to be...interesting. The common pattern for my services are:

public class Something : IDisposable
{
    ctor(){Initializes NetMQContext}
    public async Task Run(){
        //creates a CancellationTokenSource.
       //Task.Factory.StartNew(...). In task:
       // creates socket
        //spins while it's false doing socket receive/send
    }

    bool _disposed;
    Dispose(){
        if(_disposed) return;
        _disposed = true;
       _cancel.Cancel();
       context.Dispose();
    }
}

Now depending on which thread disposal happens in, the Context dispose can sometimes hang. During normal ops, this doesn't seem to much often, but in case of network blips and the service shutting down, I get a "Service stuck in a stopping state" issue. I've pinpointed it down to the context.Dispose() hanging.

Is there some way to tell NetMQ - "hey yo - I'm dying now. Can you please just go away"? Or is there some other pattern to use?

iiwaasnet commented 9 years ago

How do you dispose the sockets? May be, at time when you try to dispose the context, not all sockets are already disposed. As well, make sure you have Linger set to 0.

ashic commented 9 years ago

Hmm...will try setting linger on socket to zero in Dispose. Will report back.

ashic commented 9 years ago

Seem like simply disposing the socket (which I wasn't doing) seems to work. I'll try it in a bigger app and see how it goes. Setting linger to 0 too.

ashic commented 9 years ago

Hmm...seems I'm already doing that. This is the class in question:

https://github.com/heartysoft/res/blob/master/src/res/Res.Client/Internal/SingleThreadedZeroMqGateway.cs

Any ideas why it might be causing the service to stall when shutting down?

iiwaasnet commented 9 years ago

Are you calling ProcessResponse() and Shutdown() from different threads?

ashic commented 9 years ago

Possibly...Shutdown is called from the containing class's Dispose(). I'm really looking for a way around the multiple threads problem - any way to safely do that? Or at least, some way to allow "kill right now" for when the app is terminating? During normal operations, Shutdown won't ever get called, and ProcessResponse access will never be concurrent.

iiwaasnet commented 9 years ago

If you debug the service, you should be able to see at which code line it blocks, when you try to stop the it. Otherwise, I don't see any other way...

ashic commented 9 years ago

I've already identified that the line in question is context.Dispose(). There's some socket doing some outstanding work, it seems. But when it hangs, it's always on context.Dispose(). It triggers that line, and stall everything.

ashic commented 9 years ago

Closing this and opening https://github.com/zeromq/netmq/issues/384 .