oracle / railcar

RailCar: Rust implementation of the Open Containers Initiative oci-runtime
Other
1.12k stars 101 forks source link

optional caps and cgroup #23

Open aep opened 7 years ago

aep commented 7 years ago

we're trying railcar on a whole range of hardware, and two things came up:

generally, would you be ok to implement these as feature flags?

Are cgroups nessesary for cleanup of a pid namespace or will killing pid1 cleanup all the other processes in the pid ns anyway?

vishvananda commented 7 years ago

Caps probably just needs the proper syscall numbers in order to work on android. The init model is a bit broken without pid namespaces, but it should be ok without cgroups. There is one place where cgroups are used to find the actual pid of the great-grandchild so it can be waited on:

https://github.com/oracle/railcar/blob/9df5d1816a8397c1a248ed2514a3b89b9b89d7b4/src/main.rs#L1254

A different method will need to be devised to track the pid of the grandchild. If we had that, disabling cgroup would be fairly easy and we could add an option for it --disable-cgroups that skips the various calls into the cgroup module.

aep commented 7 years ago

whats the idea behind getting that pid? shouldnt railcar wait for pid1 only? should be easy to implement using trap or ebpf or whatever on fork, but i'm unsure if that's actually nessesary. pid1 should not daemonize. Could we implement it so that when there are no cgroups, and pid1 daemonizes, it'll simply not wait and clean up the container?

pid namespaces should exist i think, i was just wondering if they're sufficient, i.e. what cgroups are needed for in addition.

as for caps, i'm still working on evaluating why they're broken. (it's not just missing numbers)

vishvananda commented 7 years ago

The parent process needs the pid of the process as viewed from the outside so that it can either wait on it or write it to disk. The issue is that there are quite a few forks and exits to deal with namespaces and there are some complicated issues that make it impossible to pass the correct pid back via a socket. Specifically, when daemonizing inside a pid namespace, it is necessary to double fork. The second fork receives the pid of the child process inside the pid namespace, but it doesn't know what the external pid is. The cgroup lookup is to find the outside pid. Regarding whether cgroups are necessary, they do provide quite a bit of useful resource control as well as isolation for things like devices. It depends on your usecase, but It might be best to get them working.