opencontainers / runc

CLI tool for spawning and running containers according to the OCI specification
https://www.opencontainers.org/
Apache License 2.0
11.89k stars 2.11k forks source link

Add option to do ptrace() root emulation for rootless containers #1778

Open hifi opened 6 years ago

hifi commented 6 years ago

I have went through some old and current issues about rootless containers and found a few attempts to make rootless containers look like they actually have root to their child processes but they looked extremely complicated.

Actually running apt or other package management tools inside a rootless container doesn't necessarily require that much effort to implement so I thought I'd give it a shot so I wrote whatever to see how much it really requires. Apparently not much if you cheat a lot.

Could we have an option like --fakeroot to do this similar crude syscall diversion for package managers built into runc?

https://github.com/hifi/whatever

AkihiroSuda commented 6 years ago

Questions:

hifi commented 6 years ago

I'm thinking about simple and slow emulation to have just enough to make package managers happy so we can trivially build images as unprivileged users albeit the files will have wrong owners unless chown emulation is in place like you said.

ptrace() did not feel unusable slow for running plain apt, it was fairly fast for me visibly.

cyphar commented 6 years ago

Actually running apt or other package management tools inside a rootless container doesn't necessarily require that much effort to implement so I thought I'd give it a shot so I wrote whatever to see how much it really requires. Apparently not much if you cheat a lot.

We already have projects that do this -- @AkihiroSuda's PRoot fork is probably the most mature one (I had a project that did this a while ago but it's basically bitrotted) and it will be put into https://github.com/rootless-containers. IMO user.rootlesscontainers chown emulation is actually pretty important if you want to use the container images for anything "important" -- because otherwise you might end up shipping potentially dangerous (think a suid binary with the wrong owner) images.

With regards to cheating, it doesn't take a lot to make most things work, but if you want to make things work consistently you need to do things like completely emulate the POSIX privilege model (including faking what the current user, what groups are currently in use, and so on). I did most of this work in https://github.com/cyphar/remainroot -- but as I said it bitrotted and PRoot also already does this stuff.


As for embedding this into runc -- unfortunately this is non-trivial. One of the main design aspects of runc is that there is no long-running runc process that sticks around when you create containers. If you run runc run -d then the runc code will exit as soon as pid1 user code has been started. Therefore, we couldn't have a ptrace daemon without breaking that (quite important and useful) feature of runc.

And all of that is before we get into questions about whether or not --fakeroot is a good idea in terms of the spec or container runtime adoption (because it's an out-of-spec extension that will lock more people into using runc). If it's trivial to do in an implementation-independent way (without modifying runc) then I would prefer that over modifying runc so that other runtimes are forced to implement the same feature when it would otherwise be un-necessary.

But most importantly of all, there is nothing stopping you from adding your ptracer into your container and modifying your config.json to use it. (If you compile it statically) it would be fairly simple -- you just bind-mount the binary and execute it. This is what Docker does for docker run --init and is the simplest way of doing this. One of the goals of https://rootlesscontaine.rs/ and https://github.com/rootless-containers is to provide tooling that will make it easier for people to do these sorts of things automatically.

hifi commented 6 years ago

What you both missed was the simplicity of doing enough for package managers (notably just apt for now) versus the bigger emulation layers that try to do things correctly.

I did not intend to create a project around my one night hack, just to initiate discussion that 200 lines of badly written C does indeed allow you to run programs like apt which would be enough to enable a lot of flexibility with the least amount of added code.

cyphar commented 6 years ago

What you both missed was the simplicity of doing enough for package managers (notably just apt for now) versus the bigger emulation layers that try to do things correctly.

I didn't miss it, I just disagree that a majority people would be happy with it for production uses (because it could cause programs to start freaking out if they double-check that privilege operations did what they expect -- apt used to do this when I was working on remainroot inside their "sandbox check" code, I'm not sure why it works now). I already know of quite a few enterprises using rootless containers in staging or production environments, and that's a pretty important consideration.

But my general concerns on --fakeroot apply to the idea of integrating things that require long-running daemons as part of runc -- these apply no matter which project you want to integrate into runc. Even the new seccomp-bpf syscall emulation code being proposed by Tycho Anderson still won't allow it to be done without a daemon.

As I said, I agree that it's a pain to deal with this at the moment, and making it easier through wrapping tools and documentation is a goal of https://rootlesscontaine.rs/. There are some other things (like unprivileged networking) that also are difficult to get working, and also require documentation and so on. We cannot just integrate all of these things into runc because of the concerns I outlined above (no-daemon requirement, bloat, adding extensions when things could be done within the specification, etc).

hifi commented 6 years ago

Apt still does double check it but I keep the state, badly, but it is there. Everything could be done better, it's just a minimal proof-of-concept as I was not happy with the bigger alternatives for such simple thing that does not track much or try to even pass as real environment.

Technically, wouldn't runc be able to ptrace the binary it launches as pid 1?

cyphar commented 6 years ago

I understand the frustration with simple stuff not working (that's the feeling I had when I first got rootless containers working and realised there was even more work to do :wink:). But in my view, the better way of making these sorts of things easier is by having wrapping tools or documentation so people can get this stuff to work.

You can use PRoot or whatever as the pid1 inside the container and it will also work. And then you don't have to modify runc.

Technically, wouldn't runc be able to ptrace the binary it launches as pid 1?

Yes, but runc exits after pid1 launches if you use it in detached mode (which is the usual mode of operation inside of things like cri-o). You can try this for yourself if you do runc run -d.

At the moment the only reason why we have a runc process sticking around in non-detached mode is to do I/O copying. Making it a tracer would require having a long-running process for each container to function correctly (something that runc does not require at the moment, and is a major advantage of runc over other container runtimes).

hifi commented 6 years ago

Just a thought, it wouldn't need to run as a tracer for other than rootless containers.

Anyway, do as you see best, I just wanted to share my findings. If you don't see this as being possible even in the long run, please close this issue for future reference.