Open fabiand opened 7 years ago
You should be able to do this with the mount fields in the spec the say way Docker does when you specify the flag like --cpu host
.
Have you first tried building a spec that does this? Did you hit any issues? Its worth a try because I'm pretty sure this is possible without the runtime-spec and without any changes to runc.
So I must say that I tried this so far with docker only, but the conceptual problem should also exist in runc.
I used plain -v /dev:/dev -v /sys:/sys
to pass dev and sys to the container, but this required me the following (and referenced above) workarounds:
mkdir /dev.container && {
mount --rbind /dev /dev.container
mount --rbind /host-dev /dev
# Keep some devices from the containerinal /dev
keep() { mount --rbind /dev.container/$1 /dev/$1 ; }
keep shm
keep mqueue
# Keep ptmx/pts for pty creation
keep pts
mount --rbind /dev/pts/ptmx /dev/ptmx
# Use the container /dev/kvm if available
[[ -e /dev.container/kvm ]] && keep kvm
}
This is just one "workaround" of a few.
/dev
and /sys
are complicated because of how they work with namespaces. For /dev
we need to create things like /dev/pts
ourselves in order to create TTYs inside the container's mount namespace. Not to mention that you don't want to provide access to your raw disk inside a container (to be fair, the default runc
setting blocks all /dev
access, but the default Docker one lets you do some worrying stuff). /sys
(and /proc/sys
) change based on what namespaces you're in, and I'm not sure mixing them between namespaces is a great idea either.
Yes, I agree that they are complicated - and that you mention it /proc
as well (maybe easier).
It doesn't make sense to mix and match those paths and arbitrary container flags - I'm speaking about a sane subset, i.e. host pid, ipc, and uts. There will still be issues i.e. with cgroups (I guess, which differ between host and container). With those flags set, I do see a value in such a high-level knob to mount hosts dev, sys and proc paths.
The main reason - and user story - for this is to allow shipping low-level tools in containers, but operate on the host - https://github.com/kubernetes/community/pull/589 goes into this direction. Also containerization of components like Cinder, gluster, or ceph.
A note on block devices - I hope that there will be a controlled way to get them into containers, and according to https://github.com/kubernetes/community/pull/805 this is also in progress.
So yes, it's difficult and complex combinations arise - but I still believe we need some standardization aroun dit to make it supportable.
Oh - And yes, docker might make some crazy stuff, maybe it can done differently here.
Some containers carry software which needs to have access to the hosts /dev and /sys trees.
Today mounting /dev and /sys is not sufficient, and a few workarounds are needed (see for example https://github.com/kubevirt/libvirt/blob/master/libvirtd.sh#L5-L42). It would be nice if runv provided a flag to mount these paths correctly inside the container, without the need for hacks. Something similar to docker's
-cpu host
et al.