graysky2 / anything-sync-daemon

Symlinks and syncs user specified dirs to RAM thus reducing HDD/SDD calls and speeding-up the system.
https://wiki.archlinux.org/index.php/Anything-sync-daemon
MIT License
345 stars 45 forks source link

Losing the race to create target creates a confusing state #37

Closed xorian closed 2 years ago

xorian commented 8 years ago

I've been trying out asd in a VM before using it on a production system. I put /var in WHATTOSYNC, which maybe wasn't the best choice to start. After a couple unclean reboots things wound up in a bad state with a lot of things in /var apparently missing. It turns out they were in /var/.var-backup_asd which was a bit odd.

After some head scratching and reading the asd code, I believe what happened is that in ungraceful_state_check or possibly do_unsync some other process must have created /var while asd was trying to move the real backing directory back into place. If the destination of the mv command already exists as a directory, the source will be moved inside it. At least that's my best guess as to how this happened.

One quick fix to avoid the confusing state my system was left in is to use --no-target-directory with mv. That doesn't fix the problem of the race condition, but it avoids leaving the sync target in a weird state. I'll create a pull request with that change.

The only way I can see to avoid the potential race condition is to use bind mounts rather than renaming and symlinking. Unmounting a bind mount should atomically restore the original directory without any possibility of intervening operations creating the target. Of course I'm not sure that would actually work but this article suggests that asd could be written to work that way.

graysky2 commented 8 years ago

Implemented in https://github.com/graysky2/anything-sync-daemon/commit/6dbef1e3bbf5f08ff57f5689e824c378a05b8d1d

xorian commented 8 years ago

The change in 6dbef1e should change it from being a silent corruption to an obvious failure. However it won't prevent the problem I described. There race with anything else creating the synced directory between when the symlink is removed and when the backing directory is renamed back into place is still there.

graysky2 commented 8 years ago

Ah... do you have a recommendation to address it?

xorian commented 8 years ago

I think the only way to avoid it would be to avoid the renaming and symlinking entirely, and make the overly/ramdisk filesystem appear in the place of the original directory by using a bind mount. (Actually, you would need at least two: one to make the original directory accessible at a different location and another to take the place of the real path.) Unmounting the bind mount would be an atomic return to the original directory, with no opportunity for any other changes to sneak in from a parallel process. This article gives an example of how this can be done.

I've done a few experiments with this by hand to figure out the details. I would like to make some changes to asd in a branch to implement this, but I haven't found time to do that yet.

ThibaultLemaire commented 8 years ago

I'm actually implementing this right now. I was originally working on handling target directories that are mountpoints, and I ended up finding it easier to just use bind mounts. You can take a look at my early code on the branch handle-mountpoints.