Removed expiration of lock key

marcoCasamento / Hangfire.Redis.StackExchange

HangFire Redis storage based on original (and now unsupported) Hangfire.Redis but using lovely StackExchange.Redis client

Other

452 stars 108 forks source link

Removed expiration of lock key #67

Closed jabrown85 closed 6 years ago

jabrown85 commented 6 years ago

The key being used for a lock was being expired at the same time as the timeout for the subsequent lock timeouts. This meant that a lock for would expire basically as soon as the subsequent job waiting for the lock would stop. More often than not, this meant the job that is awaiting the lock would actually acquire the lock while the first job that obtained it was still executing.

jabrown85 commented 6 years ago

I hope this is OK. I was trying to understand all the pieces and I think I got it. The problem, as I saw it, was that the timeout of the KEY used for locking would expire using the same timeout parameter as the DisableConcurrentExecution attribute. This doesn't seem right. The attribute's parameter is the timeout value for subsequent job executions max time to wait for a lock (implemented in RedisLock correctly). Using it for the key timeout means that it could/will expire RIGHT as a job is perhaps checking again for the lock. If the job obtains the lock, you have 2 jobs running at the same time. The version is master works sometimes, but I think it is mostly by chance and greatly depends on the timeout/intervals between redis checks.

I haven't had any time to look at the tests, but I can take a stab at writing some if you want.

jabrown85 commented 6 years ago

Also, I think issue #65 is this issue.

jabrown85 commented 6 years ago

I think we would need to do something better here. (perhaps more like https://github.com/frankhommers/Hangfire.PostgreSql/pull/44/)

The problem with my quick fix is that since the lock never expires, a hard crash means that nothing will ever run until the lock is manually removed. I think exposing a lock timeout in the options will allow application builders to specify the longest timespan they intend on letting a lock live. A different option would be having the lock renew periodically, but I don't know if a ServerFilter is the right place for that.

In the meantime, I worked around this issue by removing the attribute entirely and using a second queue with a single worker. (https://discuss.hangfire.io/t/different-queues-having-different-worker-counts/114). It's less clean, but will guarantee that only one runs at a time.

pieceofsummer commented 6 years ago

@jabrown85 please check out #69, I think it should resolve all the problems here.

jabrown85 commented 6 years ago

That looks good! That’s what I figured needed to happen based on other implementations. I hope that gets merged soon!

pieceofsummer commented 6 years ago

It worked fine for a day on my Linux server but still needs some more testing.