raisedapp / Hangfire.Storage.SQLite

An Alternative SQLite Storage for Hangfire
https://www.nuget.org/packages/Hangfire.Storage.SQLite
MIT License
155 stars 31 forks source link

Distributed lock exception - 0.3.0 #38

Closed todorovicg closed 2 years ago

todorovicg commented 3 years ago

Still randomly getting distributed lock exception. Hangfire.Storage.SQLite version: 0.3.0 Hangfire.Core version: 1.7.22

Timeout expired. The timeout elapsed prior to obtaining a distributed lock on the 'Could not place a lock on the resource 'HangFire:extension:job-mutex:lock:StatusUpdateJob': The lock request timed out.' resource. at Hangfire.Storage.SQLite.SQLiteDistributedLock.Acquire(TimeSpan timeout) at Hangfire.Storage.SQLite.SQLiteDistributedLock..ctor(String resource, TimeSpan timeout, HangfireDbContext database, SQLiteStorageOptions storageOptions) at Hangfire.Storage.SQLite.HangfireSQLiteConnection.AcquireDistributedLock(String resource, TimeSpan timeout)

fnajera-rac-de commented 2 years ago

The following unit test reproduces the bug:

        // see https://github.com/raisedapp/Hangfire.Storage.SQLite/issues/38
        [Fact, CleanDatabase]
        public void Ctor_SetLockExpireAtWorks_WhenResourceIsLockedAndExpired()
        {
            UseConnection(database =>
            {
                // add a lock (taken by another process who is now killed) which will expire in 3 seconds from now
                var distributedLock = new DistributedLock();
                distributedLock.Id = Guid.NewGuid().ToString();
                distributedLock.Resource = "resource1";
                distributedLock.ExpireAt = DateTime.UtcNow.Add(TimeSpan.FromSeconds(3));
                database.Database.Insert(distributedLock);

                // try to get the lock in the next 10 seconds
                // ideally, after ~3 seconds, the constructor should succeed
                using (new SQLiteDistributedLock("resource1", TimeSpan.FromSeconds(10), database, new SQLiteStorageOptions() { DistributedLockLifetime = TimeSpan.FromSeconds(3) }))
                {
                    DistributedLock lockEntry = database.DistributedLockRepository.FirstOrDefault(_ => _.Resource == "resource1");
                    Assert.NotNull(lockEntry);
                }
            });
        }
fnajera-rac-de commented 2 years ago

@felixclase Could you please have a look at this bug? You can copy/paste the code above into SQLiteDistributedLockFacts.cs to reproduce it. The real-life scenario would be that the process using this library is killed while the lock was hold.

Edit: I think the problem is that the Cleanup() is called only once before the waiting loop - I think it should be called while in the loop as well, since the expiration might happen while we wait. But I'm not sure of what is the best implementation.

fnajera-rac-de commented 2 years ago

After some testing, I think the Acquire() logic is flawed. Why would you extend an existing lock (which it's not yours) each time you try to Acquire() it (because of the call to .Update())? This makes it impossible to acquire a lock which is already taken but expiring - we are extending its expiration date artificially each time. And there is already a mechanism in place to extend the lock by its owner (the heartbeat).

Also: the lock in the database has no concept of 'owner'. If it did, the problem above (extending a lock you don't own) would have never happened in the first place.

I'll try to work on this and make a PR.

fnajera-rac-de commented 2 years ago

See #41

todorovicg commented 2 years ago

Nice find @fnajera-rac-de! I've checked your PR as well, looks good.

@felixclase On an unrelated note I've provided a PR #42 which should resolve an issues that I've ran into once. To not bother with opening a new issue, I'll just leave this comment here.

felixclase commented 2 years ago

Thanks guys, I will be testing your changes this weekend

felixclase commented 2 years ago

Fix published in version 0.3.1