nopSolutions / nopCommerce

ASP.NET Core eCommerce software. nopCommerce is a free and open-source shopping cart.
https://www.nopcommerce.com
Other
9.19k stars 5.28k forks source link

TaskScheduler race condition (includes repro unit test) #171

Closed AndreiMaz closed 7 years ago

AndreiMaz commented 8 years ago

Let's invitigate

race_condition_repro.patch.txt Hello all, I looked through the source of the new 3.70 release (very excited about the new Azure scaling support!). I believe there is a race condition in the implementation of the Task.cs Execute() method. Specifically, there is no lock preventing the LeasedUntilUtc value from being modified in between the time it is read and written back to the redis cache. Therefore, if multiple machines read an empty lease value, they will all simultaneously enter the lock and execute the scheduled task simultaneously.

Redis specifically has the "NX option" which sets a lock value ONLY if the value is presently empty to handle this situation (see http://redis.io/topics/distlock), so I believe that is the proper solution instead of a transaction lock.

I have attached a patch to the raw 3.70 source with a new WebFarmTaskRaceCondition unit test which will reproduce this behavior. This test explicitly forces Tasks running on two different threads to enter the lock simultaneously. You can see both of them run concurrently and print

Executing TaskRace on Thread 10.... Executing TaskRace on Thread 11....

in the output.

Patch at https://gist.github.com/anonymous/f783a4793f039f46ef3b/raw/489779eeb07d0d1efd88a57de80ed72f9af593e8/race_condition_repro.patch

To apply the patch

  1. From the downloaded 3.70 source: ~/Downloads/nopcommerce370/nopCommerce_3.70_Source$ patch -p2 -l < race-condition-repro.patch
  2. From git: git checkout f6b7de7 # checkout the 3.70 release git checkout -b race-condition-repro git apply < race-condition-repro.patch

Thanks, Matt

Source: http://www.nopcommerce.com/boards/t/39997/taskscheduler-race-condition-in-370-includes-repro-unit-test.aspx

rossiter10 commented 8 years ago

I've been experimenting with solutions for this lately (for older versions, before I noticed you added the Lease columns) and thought it might be helpful to add what I've tried. I did something similar to the Lease columns and added a "IsRunning" column, but I ran into the same issue you are describing.

My hacky solution so far has been to call Thread.Sleep() for a few seconds if the machine name equals one of our two machine names behind the load balancer. Maybe you could sleep for a random time on every instance to hopefully stagger the tasks? (I realize this might not always work so it's not ideal).