real-logic / aeron

Efficient reliable UDP unicast, UDP multicast, and IPC message transport
Apache License 2.0
7.39k stars 888 forks source link

Multi-Destination Subscriptions don't clean up properly. #709

Closed Tomius closed 5 years ago

Tomius commented 5 years ago

If you create an manual control mode subscription, add destinations to it, then close it without removing the destinations, then the media driver will keep the subscription ports open indefinitely (until you restart the media driver).

This means that if you have an app with MDC subscription that crashes, then it's impossible to restart only the crashed app because the ports it wants to subscribe on are already in use ~and the subscription will just silently fail: the .poll() will never receive a message but will never print an error either (see also #587), and the publisher will always return NOT_CONNECTED.~

tmontgomery commented 5 years ago

Can you write a test case that demonstrates this?

When a channel endpoint is closed, it does close any rcvDestinations. But the endpoint is only closed once all subscriptions using it are closed. Could you be using that endpoint by another subscription?

tmontgomery commented 5 years ago

Checked MultiDestinationSubscriptionTests... which most do not remove destinations explicitly. Does seem to close the underlying transports when the channel endpoint is closed as part of shutdown by closing the Subscription.

So, if you have a test case that demonstrates it, feel free to send a PR. But not able to see it currently.

Tomius commented 5 years ago

See #710 for the smallest possible repro I could find that leaves a port open.

It does throw an exception though. When I saw the silent failure I was using aeron v1.15.0, with multiple media drivers and from C#.

mjpt777 commented 5 years ago

@Tomius Can you try master and see if it now works for you? I added your test case and fixed it.

tmontgomery commented 5 years ago

Your change, @mjpt777 , should address it.

tmontgomery commented 5 years ago

While a single selectNow() should handle multiple socket closes. Some past experience makes this somewhat of a crapshoot. It might be an option to call it in the multiRcvDestination close as each is closed.

Edit: I'll make slight change to handle this.

Tomius commented 5 years ago

@mjpt777 Yes, I confirm that e1e30714e020e51a430ff4c93961beea5e8e5921 fixes the issue for me.

P.S: Just wanted to let you know that this was the fastest solved issue I've ever raised on a public github repo. Well done guys!