Closed AndreiMaz closed 7 years ago
I've been experimenting with solutions for this lately (for older versions, before I noticed you added the Lease columns) and thought it might be helpful to add what I've tried. I did something similar to the Lease columns and added a "IsRunning" column, but I ran into the same issue you are describing.
My hacky solution so far has been to call Thread.Sleep() for a few seconds if the machine name equals one of our two machine names behind the load balancer. Maybe you could sleep for a random time on every instance to hopefully stagger the tasks? (I realize this might not always work so it's not ideal).
Let's invitigate
race_condition_repro.patch.txt Hello all, I looked through the source of the new 3.70 release (very excited about the new Azure scaling support!). I believe there is a race condition in the implementation of the Task.cs Execute() method. Specifically, there is no lock preventing the LeasedUntilUtc value from being modified in between the time it is read and written back to the redis cache. Therefore, if multiple machines read an empty lease value, they will all simultaneously enter the lock and execute the scheduled task simultaneously.
Redis specifically has the "NX option" which sets a lock value ONLY if the value is presently empty to handle this situation (see http://redis.io/topics/distlock), so I believe that is the proper solution instead of a transaction lock.
I have attached a patch to the raw 3.70 source with a new WebFarmTaskRaceCondition unit test which will reproduce this behavior. This test explicitly forces Tasks running on two different threads to enter the lock simultaneously. You can see both of them run concurrently and print
Executing TaskRace on Thread 10.... Executing TaskRace on Thread 11....
in the output.
Patch at https://gist.github.com/anonymous/f783a4793f039f46ef3b/raw/489779eeb07d0d1efd88a57de80ed72f9af593e8/race_condition_repro.patch
To apply the patch
Thanks, Matt
Source: http://www.nopcommerce.com/boards/t/39997/taskscheduler-race-condition-in-370-includes-repro-unit-test.aspx