opencontainers / runc

CLI tool for spawning and running containers according to the OCI specification
https://www.opencontainers.org/
Apache License 2.0
11.8k stars 2.1k forks source link

Provide an easy mechanism to mount the host's /dev and /sys paths #1563

Open fabiand opened 7 years ago

fabiand commented 7 years ago

Some containers carry software which needs to have access to the hosts /dev and /sys trees.

Today mounting /dev and /sys is not sufficient, and a few workarounds are needed (see for example https://github.com/kubevirt/libvirt/blob/master/libvirtd.sh#L5-L42). It would be nice if runv provided a flag to mount these paths correctly inside the container, without the need for hacks. Something similar to docker's -cpu host et al.

crosbymichael commented 7 years ago

You should be able to do this with the mount fields in the spec the say way Docker does when you specify the flag like --cpu host.

Have you first tried building a spec that does this? Did you hit any issues? Its worth a try because I'm pretty sure this is possible without the runtime-spec and without any changes to runc.

fabiand commented 7 years ago

So I must say that I tried this so far with docker only, but the conceptual problem should also exist in runc.

I used plain -v /dev:/dev -v /sys:/sys to pass dev and sys to the container, but this required me the following (and referenced above) workarounds:

mkdir /dev.container && {
  mount --rbind /dev /dev.container
  mount --rbind /host-dev /dev

  # Keep some devices from the containerinal /dev
  keep() { mount --rbind /dev.container/$1 /dev/$1 ; }
  keep shm
  keep mqueue
  # Keep ptmx/pts for pty creation
  keep pts
  mount --rbind /dev/pts/ptmx /dev/ptmx
  # Use the container /dev/kvm if available
  [[ -e /dev.container/kvm ]] && keep kvm
}

This is just one "workaround" of a few.

cyphar commented 7 years ago

/dev and /sys are complicated because of how they work with namespaces. For /dev we need to create things like /dev/pts ourselves in order to create TTYs inside the container's mount namespace. Not to mention that you don't want to provide access to your raw disk inside a container (to be fair, the default runc setting blocks all /dev access, but the default Docker one lets you do some worrying stuff). /sys (and /proc/sys) change based on what namespaces you're in, and I'm not sure mixing them between namespaces is a great idea either.

fabiand commented 7 years ago

Yes, I agree that they are complicated - and that you mention it /proc as well (maybe easier).

It doesn't make sense to mix and match those paths and arbitrary container flags - I'm speaking about a sane subset, i.e. host pid, ipc, and uts. There will still be issues i.e. with cgroups (I guess, which differ between host and container). With those flags set, I do see a value in such a high-level knob to mount hosts dev, sys and proc paths.

The main reason - and user story - for this is to allow shipping low-level tools in containers, but operate on the host - https://github.com/kubernetes/community/pull/589 goes into this direction. Also containerization of components like Cinder, gluster, or ceph.

A note on block devices - I hope that there will be a controlled way to get them into containers, and according to https://github.com/kubernetes/community/pull/805 this is also in progress.

So yes, it's difficult and complex combinations arise - but I still believe we need some standardization aroun dit to make it supportable.

fabiand commented 7 years ago

Oh - And yes, docker might make some crazy stuff, maybe it can done differently here.