spkenv / spk

A Package Manager for high velocity software environments, built on spfs.
https://spkenv.dev
Apache License 2.0
34 stars 5 forks source link

Improper modification of overlayfs upperdir while mounted #468

Open jrray opened 2 years ago

jrray commented 2 years ago

https://github.com/imageworks/spk/blob/ff42a0169bbef572a2b376e6cf91a88ec3cf63e9/crates/spfs/src/runtime/storage.rs#L345-L369

I saw that code comment about "a bug in spfs reset" and was trying to figure out what that bug was. I notice that this code is walking the "upperdir" and removing things. I also note this quote from the overlayfs docs: "Changes to the underlying filesystems while part of a mounted overlay filesystem are not allowed. If the underlying filesystem is changed, the behavior of the overlay is undefined, though it will not result in a crash or deadlock."

https://www.kernel.org/doc/html/latest/filesystems/overlayfs.html#changes-to-underlying-filesystems

This reset code may walk the "upperdir" to discover changes, but it should apply those changes to the overlayfs mount (/spfs) to not run afoul of the requirement to not modify the underlying filesystem. Unless this can arrange to unmount /spfs while it is doing its work.

Originally posted by @jrray in https://github.com/imageworks/spk/issues/465#issuecomment-1223379151

rydrman commented 2 years ago

It's worth noting here that one of the main reasons the upper dir is modified is because the upper dir is what's used by the commit process to actually identify and save changes. It's been some time, but IIRC if you simply change the files back to the way they were via the main /spfs directory then you still end up with files in the upper dir...

I think the best way to do this is to remount /spfs as read-only before making the changes. The docs mention that this can be done safely once it's been unmounted, but I wonder if read-only would be sufficient if we remount after the changes are made. If not, then unmounting during the changes is the next best option, I'd say.

jrray commented 2 years ago

I was trying out some things to see if I could figure out what the reset bug mentioned was. First a chmod'ed a file, which created an entry in upperdir. Then, I deleted that file in upperdir. It didn't affect the mode of the file when viewed through the overlay mount. I then chmod'ed the file again (through the overlay mount) and it didn't bring back the entry in upperdir. At that point I put my tools down and slowly backed away.

IMO the right play is to do what the manual says is legal, and unmount the overlayfs while making those changes. Unfortunately I think that means having to exec a new privileged process to do that work.

jrray commented 2 years ago

On a side note, I don't want to lose sight of just how slow the "Validating package contents" step has become, this step was near instant in python spk. We have massive environments with tens of thousands of files in them, and walking all of /spfs takes a long time.

What it's doing is looking for anything that was changed, and now seeing how this "upperdir" is being used by overlayfs, it seems like it would be a huge optimization to walk the "upperdir" instead to find what has been changed.

rydrman commented 2 years ago

I think that's fair - IIRC I went with the most simple solution, reusing the code that we already had to compute manifests rather than needing new logic to reason about the new state of the manifest based on the upper dir. FWIW the python one did not have the additional validation processes that we've added and so was able to collect much less complete information about the state of the environment after a build.