plow-technologies / inferno

A statically-typed functional scripting language
MIT License
4 stars 1 forks source link

Quick succession of saving script leads to broken history state #19

Open shulhi opened 1 year ago

shulhi commented 1 year ago

If you save the script quickly and rapidly, you will encounter TryingToAppendToNonHead exception.

image

The script history is also broken (see screenshot above). It thinks that the HEAD is the second latest one. If you pick the actual latest version, you won't be able to save the script as it throws TryingToAppendToNonHead exception.

I think we need some sort of lock while saving the script.

siddharth-krishna commented 1 year ago

I've been looking into the code of storeVCObject to see if there could be any concurrency bugs due to two racing stores that could cause issues like the above, and I think there shouldn't be. https://github.com/plow-technologies/inferno/blob/552fd3fa0e36b3ada5d54df54d78d3fa7c1649f0/inferno-vc/src/Inferno/VersionControl/Operations.hs#L145-L166

Let's say that object o1 is currently the head of a script history, and there are two calls: store(o2) and store(o3), both with pred = o1. If these calls race and one of them (say o3) goes first, then since it will rename the heads/o1 file to heads/o3 on line 156, store(o2) will either fail at the check to see if file heads/o1 exists on line 150 or when it tries to rename the file on line 156. Both store operations cannot succeed because the rename on line 156 acts as an atomic compare-and-swap (CAS), a common primitive used in lock-free concurrent data structures. This means the append on line 158 will only be executed by the successful store (in our example, o3).

There are still a few potential issues with the current code, however:

We don't catch the IO error thrown by renameFile if it fails due to a race. We should catch that, return a TryingToAppendToNonHead error and clean up the object file that was written in line 153.

Let's say we have two successive but concurrent calls to store, store(o2, pred=o1) and store(o3, pred=o2) but within a very short duration so that their updating of the to_head mappings on lines 161-163 overlap. This could lead to a situation where store(o3) finishes first, and then store(o2) comes along and incorrectly updates a bunch of to_head mappings to point to o2 instead of o3. One way to fix this might be to check at each loop iteration that the head hasn't changed, and simply break out of the loop if it has. The operation that changed the head can be responsible for updating the rest of the to_head pointers.

(cc @goodlyrottenapple @smurphy8 )