Closed ssougnez closed 2 years ago
Thanks for the report.
There were some changes made around lock extension between 2.2.0 and 2.3.1 to avoid a race condition during unlocking - so potentially there's something not quite right there.
Are you able to reproduce the problem with RedLock logging enabled at Debug
level (pass in an ILoggerFactory
when creating your RedLockFactory
) and post the logs here?
So... I don't have much experience with LoggingFactory so I'm not sure I did it correctly:
RedLockFactory.Create(endPoints, LoggerFactory.Create(builder => builder.AddConsole()))
And in the appSettings, I put:
"LogLevel": {
"RedLockNet": "Debug"
}
You'll find the logs at the end of this mail. Besides, while doing these tests, I could narrow down a bit the scenario when it occurs. So let's say that every second, the code tries to acquire a lock for a job with this line:
var rlock = _redLockFactory.CreateLock(GetType().FullName, TimeSpan.FromSeconds(30))
When it starts, it gets a lock, so an entry is created into Redis with an expiry of 30s. The job takes exactly 30 sec to execute (as it's using Task.Delay(30000) for the tests). After 15 seconds, the lock is extended. After 15 more seconds, the line above is executed before the termination of the job, so the lock is extended. However, right after, the job terminates, so the lock gets deleted. Less than one second after, the code tries to re-execute the job by executing the line above and acquires a new lock. And, for whatever reason, it is THIS lock that does not get extended correctly. It just decreases until 0 then gets recreated. When I check the logs (below), I don't really find anything weird but I suspect that I didn't do it correctly as I only see info logs and no debug ones.
info: RedLockNet.SERedis.RedLock[0] Lock status: Conflicted (Acquired: 0, Conflicted: 1, Error: 0), SFPD.KmNet.Jobs.ImportJob (35df3957-3ea8-4adb-8309-ed4b82571d4d) info: RedLockNet.SERedis.RedLock[0] Lock status: Acquired (Acquired: 1, Conflicted: 0, Error: 0), SFPD.KmNet.Jobs.ImportJob (a4fa28c2-ada7-40ae-a6d6-f9e22769dce2) info: RedLockNet.SERedis.RedLock[0] Lock status: Conflicted (Acquired: 0, Conflicted: 1, Error: 0), SFPD.KmNet.Jobs.ImportJob (8982a3af-904d-469c-8002-a17355d9237f) info: RedLockNet.SERedis.RedLock[0] Lock status: Conflicted (Acquired: 0, Conflicted: 1, Error: 0), SFPD.KmNet.Jobs.ImportJob (0e847140-f2de-4f20-b333-20430b5a947f) info: RedLockNet.SERedis.RedLock[0] Lock status: Conflicted (Acquired: 0, Conflicted: 1, Error: 0), SFPD.KmNet.Jobs.ImportJob (f5dabb5d-8bd2-4c4a-acf3-0c2e18ae8c9f) info: RedLockNet.SERedis.RedLock[0] Lock status: Conflicted (Acquired: 0, Conflicted: 1, Error: 0), SFPD.KmNet.Jobs.ImportJob (096b7272-9b67-44c8-8d34-9e8179684513) info: RedLockNet.SERedis.RedLock[0] Lock status: Conflicted (Acquired: 0, Conflicted: 1, Error: 0), SFPD.KmNet.Jobs.ImportJob (df4021a9-d4f3-4f36-b3af-2a8cc5e29f49) info: RedLockNet.SERedis.RedLock[0] Lock status: Conflicted (Acquired: 0, Conflicted: 1, Error: 0), SFPD.KmNet.Jobs.ImportJob (090d8541-03fe-4aec-962f-b9998cdfee22) info: RedLockNet.SERedis.RedLock[0] Lock status: Conflicted (Acquired: 0, Conflicted: 1, Error: 0), SFPD.KmNet.Jobs.ImportJob (5fdc41d8-0fd9-4ea4-b16a-924396976349) info: RedLockNet.SERedis.RedLock[0] Lock status: Conflicted (Acquired: 0, Conflicted: 1, Error: 0), SFPD.KmNet.Jobs.ImportJob (9d353384-a26d-4980-80d2-e31740871540) info: RedLockNet.SERedis.RedLock[0] Lock status: Conflicted (Acquired: 0, Conflicted: 1, Error: 0), SFPD.KmNet.Jobs.ImportJob (d1b0d6da-c4db-4eaf-b046-7749b0aaea77) info: RedLockNet.SERedis.RedLock[0] Lock status: Conflicted (Acquired: 0, Conflicted: 1, Error: 0), SFPD.KmNet.Jobs.ImportJob (918e4ea2-339b-41ba-a081-32a24938f97d) info: RedLockNet.SERedis.RedLock[0] Lock status: Acquired (Acquired: 1, Conflicted: 0, Error: 0), SFPD.KmNet.Jobs.CleaningJob (c4d48634-d530-40f4-a89e-5fcb00e0e5cf) info: RedLockNet.SERedis.RedLock[0] Lock status: Conflicted (Acquired: 0, Conflicted: 1, Error: 0), SFPD.KmNet.Jobs.ImportJob (6ab26856-17d6-4ca9-8eba-eb60f1c8a1ad) info: RedLockNet.SERedis.RedLock[0] Lock status: Conflicted (Acquired: 0, Conflicted: 1, Error: 0), SFPD.KmNet.Jobs.ImportJob (86900e3c-075a-4f74-8054-3fab06eded8b) info: RedLockNet.SERedis.RedLock[0] Lock status: Conflicted (Acquired: 0, Conflicted: 1, Error: 0), SFPD.KmNet.Jobs.ImportJob (d64672e3-2f60-4441-9881-6bddde573e00) info: RedLockNet.SERedis.RedLock[0] Lock status: Conflicted (Acquired: 0, Conflicted: 1, Error: 0), SFPD.KmNet.Jobs.ImportJob (029268ce-3b2a-42f3-836f-f8082ef63df3) info: RedLockNet.SERedis.RedLock[0] Lock status: Conflicted (Acquired: 0, Conflicted: 1, Error: 0), SFPD.KmNet.Jobs.ImportJob (544da76f-681d-4e4c-86d9-a4ee2376442a) info: RedLockNet.SERedis.RedLock[0] Lock status: Conflicted (Acquired: 0, Conflicted: 1, Error: 0), SFPD.KmNet.Jobs.ImportJob (6f7850a9-7b6a-40e4-af24-e88ad1582591) info: RedLockNet.SERedis.RedLock[0] Lock status: Conflicted (Acquired: 0, Conflicted: 1, Error: 0), SFPD.KmNet.Jobs.ImportJob (a573cb20-4b63-428f-92d6-3b700bb2aff4) info: RedLockNet.SERedis.RedLock[0] Lock status: Conflicted (Acquired: 0, Conflicted: 1, Error: 0), SFPD.KmNet.Jobs.ImportJob (24add582-eb3a-42e9-b496-9d9d3d1da484) info: RedLockNet.SERedis.RedLock[0] Lock status: Conflicted (Acquired: 0, Conflicted: 1, Error: 0), SFPD.KmNet.Jobs.ImportJob (440c03b4-8af7-4c27-aba2-9c8089f7f28c) info: RedLockNet.SERedis.RedLock[0] Lock status: Conflicted (Acquired: 0, Conflicted: 1, Error: 0), SFPD.KmNet.Jobs.ImportJob (36e8855d-6c08-4245-b891-85a2831fbe03) info: RedLockNet.SERedis.RedLock[0] Lock status: Conflicted (Acquired: 0, Conflicted: 1, Error: 0), SFPD.KmNet.Jobs.ImportJob (424e5ca7-23cf-4aa9-a885-c54fdaefc200) info: RedLockNet.SERedis.RedLock[0] Lock status: Conflicted (Acquired: 0, Conflicted: 1, Error: 0), SFPD.KmNet.Jobs.ImportJob (740be884-49b8-460e-b75b-daf6de766acc) info: RedLockNet.SERedis.RedLock[0] Lock status: Conflicted (Acquired: 0, Conflicted: 1, Error: 0), SFPD.KmNet.Jobs.ImportJob (40d826af-1700-4975-91b1-c9c6f8688a41)
I think you can pass in a LogLevel to AddConsole()
if you can't get it to pick it up from the app settings
I tried but there is no overload accepting a log level. I tried the syntax in your test project but I'm using .net 6 and the way to create the factory changed a bit. I'll dig deeper tomorrow if I have some times.
Yeah I was trying to repro in a test case myself.
The syntax for the newer logging looks something like this:
LoggerFactory.Create(b => b.AddSimpleConsole(opts =>
{
opts.TimestampFormat = "HH:mm:ss.ffff ";
opts.SingleLine = true;
})
.SetMinimumLevel(LogLevel.Trace));
Also, does the situation where the lock doesn't extend the second time still occur if there isn't any contention for the lock by other threads/processes?
We are seeing a similar issue in our production environment that we cannot reproduce in our test environment so must be related to load. When we roll back to 2.2.0 from 2.3.1 the issue goes away.
I've managed to reproduce the issue and identify the problem. There was a bug introduced in 2.3.1 where, if the lock wasn't acquired on the first attempt (e.g. due to it already being held) but was subsequently acquired during a retry, the extend timer would be cancelled and the lock would not auto renew.
I've pushed a pre-release version 2.3.2-pre1 if you'd like to give that a try and verify that it solves the issue for you.
Ok I'll try it soon.
Sorry for the delay, I'm currently on holiday so... not much time
I will put it in our "test" environment now and if no issues try it in "production" Thanks.
It ran a couple hours in "test" with no issues and overnight in "production" with no issues. Last time we put it in "production" issues appeared within minutes so I feel like your fix has resolved this.
Thanks for testing, I've pushed out version 2.3.2 (which is the same as 2.3.2-pre1) which contains the fix for this issue.
Hello,
the application is running scheduling jobs and uses redlock to synchronize them (avoid multiple execution at the same time, or on different servers, etc...)
Basically, the code is the following:
This is contained in a block, so I'm sure that the lock is disposed correctly. However, there is a faulty behavior with this. Indeed, I tried to create a job that simply does:
await Task.Delay(30000);
Than I launched the app and watch the state of the redlock. At first, it starts with an expiry of 30s then every second it decreased. Once it reaches 15, it gets updated to have an expiry of 30s.
So this is the correct behavior.
However, the behavior is not always consistent. Indeed, for some reason, at some point, the lock does not get extended and reaches 0 then gets recreated.
This should not happen because either the job is running and the code above is supposed to extend the lock or the job terminates and disposes the lock (so removes it from Redis) so either way, the lock should never have an expiry below 15s.
Moreover, I downgraded to 2.2.0 to see if I had the same behavior and... I didn't... so it seems to be a critical regression in the 2.3.1.
Indeed, I'm using this system in multiple applications and for each of them I used to have some weird behavior from time to time and this bug explains them all.
Cheers