Open rydrman opened 11 months ago
Perhaps we want to add some responsibility to spfs-enter to "tickle" a magic file in the fuse filesystem to let it know that everything got setup properly, otherwise without this it would shut itself down after a short grace period.
You can try to handle all the error cases and shut down spfs-fuse if something went wrong, but it is always possible for the thing that was supposed to do the cleanup to crash or be killed before it gets a chance.
I'm thinking something inconspicuous like reading or setting an extended attribute on the root of the mount
FWIW I discovered in the documentation for abort
that it doesn't work on join handles returned from spawn_blocking
. Despite the comment attached here, with abort
being a no-op we really weren't doing any kind of cleanup here. In my testing, using fuser's unmount
method does nothing and there's no way to signal to fuser::Session::run
's loop to terminate.
I've implemented a heartbeat connection between spfs-monitor and spfs-fuse, as suggested above (and recently in slack). Before adding this heartbeat, it is easily reproducible to get a spfs-fuse process hanging around forever by kill -9
'ing the related spfs-monitor process. But with the heartbeat in place, spfs-fuse will eventually timeout and exit.
This is a bit of an edge case but wanted it documented nonetheless.
We had a host which was failing to start an spfs shell upon login via ssh. In this case the user would see this error (overlayfs + fuse):
A look at the system journal would show this for spfs-fuse:
In this case, spfs-fuse was running but spfs-enter failed because overlayfs couldn't be mounted. This meant that the monitor was never started and the spfs-fuse process would stick around forever.
I was not able to identify the underlying fuse issue, and rebooting the machine resolved the mount error so we moved on.
This issue is to try and track the failure state, and have a way in which these partial mounts can still be properly torn down on failure.