rebus-org / Rebus.AzureServiceBus

:bus: Azure Service Bus transport for Rebus
https://mookid.dk/category/rebus
Other
33 stars 20 forks source link

Stop the RenewPeekLock task after `Could not renew lock` message #40

Closed jr01 closed 4 years ago

jr01 commented 4 years ago

We use 7.0.0-a14 since early May in production and it's been running very well for us. One issue though is that we notice an Error Could not renew lock for message with ID keeps repeating every 5 minutes or so for the same message ID. Those messages will go on for days, until the Rebus host process gets restarted.

We still need AutomaticallyRenewPeekLock on a few endpoints because of some long running legacy processes (15 minutes or so).

Looking at the AzureServiceBusTransport.cs probably none of these guys got executed:

context.OnCommitted(async () => renewalTask.Dispose());
context.OnAborted(() => renewalTask.Dispose());
context.OnDisposed(() => renewalTask.Dispose());

Question is why. I analyzed process dumps with windbg and Visual Studio, but haven't been able to track down the root cause (yet).

Anyway, root cause aside.

Rebus currently swallows the MessageLockLostException, so the RenewPeekLock task keeps running and failing again and again, until forever or one of the context events occur.

According to MessageLockLostException there is not much use in trying to RenewLockAsync:

//     The exception that is thrown when the lock on the message is lost. Callers should
//     call Receive and process the message again.
public sealed class MessageLockLostException : ServiceBusException

So I suggest to stop the RenewPeekLock task after a MessageLockLostException. Question is how? What do you think?

mookid8000 commented 4 years ago

So I suggest to stop the RenewPeekLock task after a MessageLockLostException

That's a good idea 🙂

Question is how?

I know how, and I can fix it in 5 minutes.

What do you think?

I think "great idea". I'll have an update out in a few minutes.

mookid8000 commented 4 years ago

ok, I was too optimistic there.... I'll get back to this task tomorrow 😐

mookid8000 commented 4 years ago

I've reduced the delay between peek lock renewals from 70% to 50% of the time until the lock expires – hopefully this will reduce the likelihood that a peek lock renewal fails.

Also, IF a peek lock renewal should fail, the renewal task will now cancel itself, thus avoiding trying to renew, when it doesn't make sense.

It's out in Rebus.AzureServiceBus 7.0.0-a15

jr01 commented 4 years ago

Nice solution/hack BTW - I wouldn't have thought about using a CancellationTokenSource and calling the Renewaltask.Dispose() in this way. I was more thinking along the lines of extending the _asyncTaskFactory to support self cancellation.

mookid8000 commented 4 years ago

that could of course also have been done 🙂 I opted for this approach to avoid having to update Rebus, because that's where the async task implementation comes from