rebus-org / Rebus.SqlServer

:bus: Microsoft SQL Server transport and persistence for Rebus
https://mookid.dk/category/rebus
Other
43 stars 42 forks source link

Stale subscription management #29

Closed matt-psaltis closed 5 years ago

matt-psaltis commented 6 years ago

Hi,

We're using the SQL server subscription storage. When our subscriber instances failover, the old subscriptions are not cleaned up. New subscriptions are created on the newly created instances with different subscriber names. I'm interested in recommendations or existing approaches for managing subscribers in a "bad actor" scenario, when the original subscriber does not get the opportunity to unsubscribe. Is there anything already available in this space?

mookid8000 commented 6 years ago

I don't know of any way of cleaning up dangling subscriptions, other than manually removing those that have become irrelevant.

Although.... I have never had this problem, which I suspect is because my subscribers do not subscribe/unsubscribe – they stay subscribed as long as they exist, because then they also receive events published when they're down.

That's actually how the pub/sub mechanism in Rebus was intended to be used – subscriptions are considered persistent and not tied to whether anything is running.

So... would it be possible for you to simply "reuse" the subscriber's input queue, i.e. just never unsubscribe unless the subscriber explicitly wants to no longer receive a particular event, or the subscriber is retired?

matt-psaltis commented 6 years ago

Thanks @mookid8000 really appreciate the feedback! "as long as they exist". I'm doing elastic scale-out which means that I don't necessarily want all of the subscribers to be long lived. I'm happy to handle this myself I guess I'm just trying to validate if this is actually a bad technology fit or just a 1% edge case scenario. I'll definitely take a look at the reuse strategy you've suggested but with the elastic scale, I could go weeks where Subscriber3 is never spun up so I think I'll still need some detection/orchestration process around the subscriptions.

Cheers.

mookid8000 commented 6 years ago

(...) I'm doing elastic scale-out which means that I don't necessarily want all of the subscribers to be long lived (...)

Which transport are you using? Can't you use the competing consumers pattern? This way, there would be a single queue subscribed to whichever topics, and then the messages would be distributed among the pool of subscribers by virtue of competing for the messages....

matt-psaltis commented 6 years ago

Its the standard SqlServer transport (combined with TransactionScope for atomic commit) however for this particular use case all subscribers need to consume the messages.

mookid8000 commented 6 years ago

Its the standard SqlServer transport

The SQL transport is cool with competing consumers 😄 no problems there, besides of course that it's not a real message queue, so it does have its limitations regarding throughput. But for many scenarios it'll probably work fine.

for this particular use case all subscribers need to consume the messages

Ah, ok – it doesn't sound like the usual way of scaleing out.... 😁 how does adding a subscriber instance help you reduce the amount of work each instance needs to do then?

matt-psaltis commented 6 years ago

These are the control messages so the instances in the scale-out pool know what they're meant to be doing. The idea being I can stop-drain instances, tune worker thread counts etc but also send notifications to all instances of global config changes and the like. :)

MrMDavidson commented 6 years ago

Depending on the nature of your control messages you might be able to leave the subscriptions as persistent but set a relatively low TTL (eg. 5 minutes, depending on the responsiveness of your workers) so those messages for Subscriber 3 get cleaned up before Subscriber 3 comes back online in 3 weeks time.

You might also benefit from splitting your queue into two sets; one queue for work (eg. workerpool.input) which your pool of scaled out workers (of a given type) use a competing consumer pattern to achieve scale out. And another for your control messages which are targeted (eg. workerpool.subscriber3.control) which has a low TTL and is used to perform your drains etc.

matt-psaltis commented 6 years ago

Thanks all, I've had a go at a few different methods for this. It seems like it should be a really simple problem to solve but when I throw Azure App Services into the mix, its proving very difficult to come up with a consistent naming strategy that isn't susceptible to Azure's normal instance migrations between virtual hosts as part of fabric maintenance etc. There's just no good way (that I've found) to guarantee that existing subscriber ids will actually be reused.

I'm toying with the idea of adding a subscription timestamp to the subscribers table. Two new features would be necessary.

  1. A heartbeat for all subscribers. They are responsible for updating their subscription timestamp (new sql column) against their subscription record.

  2. Similar to expired message cleanup, identify and delete expired subscribers.

Just a thought at this stage, haven't put fingers to keys yet.

mookid8000 commented 5 years ago

Closing this one for now. Feel free to resume the discussion if it hasn't been resolved somehow.