Open hifi opened 6 years ago
Questions:
I'm thinking about simple and slow emulation to have just enough to make package managers happy so we can trivially build images as unprivileged users albeit the files will have wrong owners unless chown emulation is in place like you said.
ptrace() did not feel unusable slow for running plain apt
, it was fairly fast for me visibly.
Actually running apt or other package management tools inside a rootless container doesn't necessarily require that much effort to implement so I thought I'd give it a shot so I wrote whatever to see how much it really requires. Apparently not much if you cheat a lot.
We already have projects that do this -- @AkihiroSuda's PRoot fork is probably the most mature one (I had a project that did this a while ago but it's basically bitrotted) and it will be put into https://github.com/rootless-containers. IMO user.rootlesscontainers
chown emulation is actually pretty important if you want to use the container images for anything "important" -- because otherwise you might end up shipping potentially dangerous (think a suid
binary with the wrong owner) images.
With regards to cheating, it doesn't take a lot to make most things work, but if you want to make things work consistently you need to do things like completely emulate the POSIX privilege model (including faking what the current user, what groups are currently in use, and so on). I did most of this work in https://github.com/cyphar/remainroot -- but as I said it bitrotted and PRoot also already does this stuff.
As for embedding this into runc
-- unfortunately this is non-trivial. One of the main design aspects of runc
is that there is no long-running runc
process that sticks around when you create containers. If you run runc run -d
then the runc
code will exit as soon as pid1
user code has been started. Therefore, we couldn't have a ptrace
daemon without breaking that (quite important and useful) feature of runc
.
And all of that is before we get into questions about whether or not --fakeroot
is a good idea in terms of the spec or container runtime adoption (because it's an out-of-spec extension that will lock more people into using runc
). If it's trivial to do in an implementation-independent way (without modifying runc
) then I would prefer that over modifying runc
so that other runtimes are forced to implement the same feature when it would otherwise be un-necessary.
But most importantly of all, there is nothing stopping you from adding your ptracer into your container and modifying your config.json
to use it. (If you compile it statically) it would be fairly simple -- you just bind-mount the binary and execute it. This is what Docker does for docker run --init
and is the simplest way of doing this. One of the goals of https://rootlesscontaine.rs/ and https://github.com/rootless-containers is to provide tooling that will make it easier for people to do these sorts of things automatically.
What you both missed was the simplicity of doing enough for package managers (notably just apt for now) versus the bigger emulation layers that try to do things correctly.
I did not intend to create a project around my one night hack, just to initiate discussion that 200 lines of badly written C does indeed allow you to run programs like apt which would be enough to enable a lot of flexibility with the least amount of added code.
What you both missed was the simplicity of doing enough for package managers (notably just apt for now) versus the bigger emulation layers that try to do things correctly.
I didn't miss it, I just disagree that a majority people would be happy with it for production uses (because it could cause programs to start freaking out if they double-check that privilege operations did what they expect -- apt
used to do this when I was working on remainroot
inside their "sandbox check" code, I'm not sure why it works now). I already know of quite a few enterprises using rootless containers in staging or production environments, and that's a pretty important consideration.
But my general concerns on --fakeroot
apply to the idea of integrating things that require long-running daemons as part of runc
-- these apply no matter which project you want to integrate into runc
. Even the new seccomp-bpf
syscall emulation code being proposed by Tycho Anderson still won't allow it to be done without a daemon.
As I said, I agree that it's a pain to deal with this at the moment, and making it easier through wrapping tools and documentation is a goal of https://rootlesscontaine.rs/. There are some other things (like unprivileged networking) that also are difficult to get working, and also require documentation and so on. We cannot just integrate all of these things into runc
because of the concerns I outlined above (no-daemon requirement, bloat, adding extensions when things could be done within the specification, etc).
Apt still does double check it but I keep the state, badly, but it is there. Everything could be done better, it's just a minimal proof-of-concept as I was not happy with the bigger alternatives for such simple thing that does not track much or try to even pass as real environment.
Technically, wouldn't runc be able to ptrace the binary it launches as pid 1?
I understand the frustration with simple stuff not working (that's the feeling I had when I first got rootless containers working and realised there was even more work to do :wink:). But in my view, the better way of making these sorts of things easier is by having wrapping tools or documentation so people can get this stuff to work.
You can use PRoot
or whatever
as the pid1 inside the container and it will also work. And then you don't have to modify runc
.
Technically, wouldn't runc be able to ptrace the binary it launches as pid 1?
Yes, but runc
exits after pid1
launches if you use it in detached mode (which is the usual mode of operation inside of things like cri-o
). You can try this for yourself if you do runc run -d
.
At the moment the only reason why we have a runc
process sticking around in non-detached mode is to do I/O copying. Making it a tracer would require having a long-running process for each container to function correctly (something that runc
does not require at the moment, and is a major advantage of runc
over other container runtimes).
Just a thought, it wouldn't need to run as a tracer for other than rootless containers.
Anyway, do as you see best, I just wanted to share my findings. If you don't see this as being possible even in the long run, please close this issue for future reference.
I have went through some old and current issues about rootless containers and found a few attempts to make rootless containers look like they actually have root to their child processes but they looked extremely complicated.
Actually running
apt
or other package management tools inside a rootless container doesn't necessarily require that much effort to implement so I thought I'd give it a shot so I wrotewhatever
to see how much it really requires. Apparently not much if you cheat a lot.Could we have an option like
--fakeroot
to do this similar crude syscall diversion for package managers built into runc?https://github.com/hifi/whatever