Unable to update heartbeat - still happening in .NET 6.0

raisedapp / Hangfire.Storage.SQLite

An Alternative SQLite Storage for Hangfire

https://www.nuget.org/packages/Hangfire.Storage.SQLite

MIT License

155 stars 31 forks source link

Unable to update heartbeat - still happening in .NET 6.0 #69

Closed fnajera-rac-de closed 7 months ago

fnajera-rac-de commented 8 months ago

My ASP.NET Core 6 app shows this error very often:

Unable to update heartbeat on the resource 'HangFire:xxx'. The resource is not locked or is locked by another owner.

I believe this has to do with #68, and may be gone if that error is fixed.

But regardless of #68, if SQLiteDistributedLock cannot update the heartbeat because of that message, what's the point of keep retrying? I think the timer should be stopped in that case - or at the minimum, mute the error log so that it doesn't show up indefinitely in the logs.

TXRock commented 8 months ago

In my case, happens when published to a Linux distribution but not in Windows

TXRock commented 8 months ago

From what I read in the past, multi-thread, multi-process access to the database in LiteDB is quite different than SQLite, maybe that is the reason LiteDB might not have the same issues as SQLite regarding the distributed lock part.

fnajera-rac-de commented 8 months ago

@TXRock are you using AcquireDistributedLock in async methods?

TXRock commented 8 months ago

@TXRock are you using AcquireDistributedLock in async methods?

I am not using AcquireDistributedLock, but indeed my jobs are executing async methods.

But not sure how this is related with the heartbeat check.

It goes with (Hangfire.Storage.SQLite.SQLiteDistributedLock) Unable to update heartbeat on the resource 'HangFire:job:xxx:state-lock'. SQLite.SQLiteException: database is locked and later on (Hangfire.Storage.SQLite.SQLiteDistributedLock) Unable to update heartbeat on the resource 'HangFire:job:xxx:state-lock'. The resource is not locked or is locked by another owner. and could not recover.

fnajera-rac-de commented 8 months ago

See #68 for an internal usage of ThreadLocal which seems incompatible with async.

The "database is locked" is probably a transaction failing and not being retried (haven't investigated that one).

But if you look at the code for the message "The resource is not locked or is locked by another owner" I think you'll find the situation described in the other ticket. I assume SQLiteDistributedLock is used also internally by the library even if you don't have explicit usages of it.

I'll see if I can get some time to add a unit test for this problem (at least in the scenario I found)