zeromq / netmq

A 100% native C# implementation of ZeroMQ for .NET
Other
2.95k stars 744 forks source link

Poller will throw ObjectDisposed exception if Socket.Dispose() called #834

Open jasells opened 4 years ago

jasells commented 4 years ago

Environment

NetMQ Version:    4.0.0.2
Operating System:  Win 10
.NET Version:     Standard 2.0 / Core

Expected behaviour

poller.Remove(socket);
socket.Dispose();

should remove the socket from the poller's list, and then dispose the socket, poller should continue to run servicing other sockets.

Actual behaviour

poller throws ObjectDisposedException after socket.Dispose() is called

Steps to reproduce the behaviour

This code in a unit test:

poller.Remove(socket);
socket.Dispose();

Causes an ObjectDisposedException error to throw on the poller's thread from SocketBase.CheckDisposed(), and kill my unit test runtime. There needs to be some better syncing on Poller.Remove() so that it blocks the the poller's thread and/or the caller of Remove() such that the Remove does not return until that socket is actually removed from the poller's internal list. This is the issue as it is currently written.

I would argue that CheckDisposed() should return a bool so that poller can not try to access that socket, and avoid throwing the exception.

I can't find a path to calling Socket.Dispose() after it has been passed to Poller.Add(Socket) and Poller.RunAsync() is called that doesn't throw.

jasells commented 4 years ago

I created a fork to repro by adding a new unit test NetMQPollerTest.RemoveAndDisposeSocket().

This test as-is will never complete, as the exception kills the test execution, and there is no way to catch it to handle it gracefully from test code (or app code).

Debug the test to see the exception. Set exception Settings for System.ArgumentException to "break when thrown"

jasells commented 4 years ago

I made branch here with a potential fix.

I refactored NetMQPoller.Remove(ISocketPollable) to return a Task instead of void.

This allows for proper thread syncing of the async ops needed to properly remove a socket from NetMQPoller internally before the removed socket can be potentially disposed.

If .NET4.0 support could be removed, the fix can be made entirely internal and the change would be a non-breaking change to any code using it. (As it is, it is mostly non-breaking, unless app code is using the ISocketPollableCollection interface explicitly)

dropping .NET4.0 support would allow the method to be defined as async void Remove(ISocketPollable) and the await can be applied internally, maintaining the outward API as a syncronous-API.

Maybe there is some other workaround for .NET40?

BTW, it appears to me that there are lots of places in this code base that may benefit from async-await-task pattern... even if only internally.

toonetown commented 4 years ago

Yes - a million times YES! This issue has been hitting us for years...we just haven't had much success in tracking it down. Is there a corresponding pull request?

jasells commented 4 years ago

Will get one in tomorrow.  I have a better fix that doesn’t break API. Sent from Mail for Windows 10 From: Nathan TooneSent: Friday, December 6, 2019 10:22 AMTo: zeromq/netmqCc: jasells; AuthorSubject: Re: [zeromq/netmq] Poller will throw ObjectDisposed exception if Socket.Dispose() called (#834) Yes - a million times YES! This issue has been hitting us for years...we just haven't had much success in tracking it down. Is there a corresponding pull request?—You are receiving this because you authored the thread.Reply to this email directly, view it on GitHub, or unsubscribe. 

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had activity for 365 days. It will be closed if no further activity occurs within 56 days. Thank you for your contributions.

jasells commented 3 years ago

So, there is a PR, but it is a little old, and needs updates to resolve conflicts with master.

I have been side-tracked and responses were slow so I lost track when it was fresh in my mind. I will try to get this PR resolved.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had activity for 365 days. It will be closed if no further activity occurs within 56 days. Thank you for your contributions.