Open richardxia opened 10 months ago
Long term solution proposal:
From there you're close to virtulization, you just also need a blob store that you read/write to when constructing these sandboxes. The blob store can be cleaned up at the end of operation. Non-sandboxes jobs will then have to require a lock to run and you can then run multiple wakes in parallel.
I assume that OverlayFS doesn't have a way to track reads? If so, then one thing to note is that this will remove a feature of the FUSE runner, which is that you can throw a ton of visible files at a job, but wake will keep track of only the ones that were read by the job in order to determine when it will need to be rerun in the future. If we're OK with this tradeoff, then we probably at least need to do an audit of our internal jobs to see where we should be more specific about which files are sent as visible files. I think this mostly just affects source files (e.g. ones committed to Git), since it's a common pattern to source everything in a directory tree and then pass it to the compiler to go figure out what the dependency chains were.
I guess another option could be to continue using FUSE but to have it be backed by a temporary OverlayFS volume? This would give us better encapsulation while also allowing us to track reads.
Yeah it has no way to track reads but I think thats a good thing because it was never implemented correctly and implementing it correctly buys us very little. It also greatly simplifies job match criteria
I can't remember, but did we not already make the change that jobs rerun based on visible list and not read list? I know we talked about it before but I don't remember what the resolution was
I can't remember, but did we not already make the change that jobs rerun based on visible list and not read list? I know we talked about it before but I don't remember what the resolution was
Yeah, @JakeSiFive let me know offline that we did that. We apparently did that in the runner of our internal codebase rather than in the wake repo itself.
cc @ngraybeal, who did the bulk of the work in getting a repro and narrowing it down
It looks like there is a file system sandbox safety issue in scenarios where a file exists in the host FS but is not made visible to the sandboxed process and the sandboxed process attempts to create a file at the same relative path. Although this was intentional in some scenarios, it can lead to very unexpected results and even safety issues.
Reproduction steps
Here's the repro that @ngraybeal came up with:
1st job creates a
foo
directory with a couple files. 2nd job creates the samefoo
directory, writes to one of the same files as the 1st job, proceeds tocp
the contents offoo
to a new dirfoo2
, it thenmv
foo
tofoo3
and does removesfoo3
.Here's what I expect the output in my file system to be.
Here's what actually gets written
Alternative Scenario
There's a different variation of this that affects a single job but multiple runs. One easy way to hit this is when creating a Python virtual environment and upgrading the version of the
pip
package manager. Here is a rough repro, but it will depend on an external resource for providing a version of Python:Running this twice causes a strange
~ip
directory to appear in thevenv/lib/python3.10/site-packages/
directory, even though it doesn't appear the first time around.The reason for this requires a bit of extra explanation of what pip is doing under the hood. In the above example, we are actually installing pip twice: once when creating the virtual environment and once with the explicit
pip install -U pip
command. When pip upgrades a package, it moves the older version of the package to a temporary directory that starts with a tilde (theip
in~ip
come frompip
). If the upgrade is successful, it deletes the temporary directory, but if it fails, then it restores the temporary directory.The way this interacts with the wake FUSE file system is the following:
pip
to~ip
~ip
venv/
directory through to the host FS. The host FS now has a bit of a frankenstein of files from both versions of pip installed (the newer version from the old run and the older version from the new run)pip
to~ip
mv
on the host file system, but this moves files that were in thepip
directory that were invisible to sandboxed process into the ~ip` directory as well~ip
~ip
that was visible to the sandboxed process from the host FS. This leaves around files that the sandboxed process did not know about but that fuse-waked had moved anyway a few steps above. The end result is that the~ip
directory still exists in the host FS