Closed cyphar closed 1 month ago
After speaking to @brauner, I think that just returning errors in the case of overmounts is what all users would practically expect. If you want to support lxcfs then you would need to resolve things through /proc
anyway and you don't care about doing safe procfs operations anyway.
And yeah, the fact we need this internally kind of indicates what kinds of things people will need this for.
A workaround I have for the second issue is that we only cache the subset=pid
handle, and if an operation fails on it we create a temporary handle that doesn't have subset=pid
which is closed after the operation finishes. This is more expensive than a regular open but avoids us leaking unmasked procfs
mounts into containers. Of course, the open("/proc")
handle is just as unsafe as before for this.
While it is fairly trivial to just add a
ProcfsBase::Root
enum variant that maps to/proc/.
, there are a few minor issues to consider:While it is generally considered invalid to mount on top of stuff in
/proc/$pid/...
(see #45), tools like https://github.com/lxc/lxcfs intentionally mount fake files on top of stuff in the/proc
root (such as/proc/cpuinfo
and/proc/meminfo
).RESOLVE_NO_XDEV
will block this, which is what we want from a security perspective but this could lead to errors that confuse users when running programs inside containers.We can no longer use
subset=pid
for our internalprocfs
mount created withfsopen
. In principle this isn't necessary, but it seems like a bad idea to have a copy of files that might be masked inside a container (usingPR_SET_DUMPABLE
you can block most of the ways a container process could access the file descriptor but it's not a given that every program would know to do that, and if the file descriptor gets leaked for some reason then there is no protection). We could work around this by reconfiguring the mount if it was made by us withfsopen
to avoid enabling this by default, but we could never reconfigure it back tosubset=pid
because of possible races (we could add locking but it'd be nice to not have to).We need this for #58 to check the value of
/proc/sys/fs/privileged_symlinks
(unless we do the lookup onPROCFS_HANDLE
directly without using the officialProcfsHandle
API).