vmware / vic

vSphere Integrated Containers Engine is a container runtime for vSphere.
http://vmware.github.io/vic
Other
639 stars 173 forks source link

Feature Request: Enable systemd based Containers in VIC #8462

Open tschwaller opened 5 years ago

tschwaller commented 5 years ago

User Statement As a user I would like to deploy containers using systemd inside the container, since this greatly simplifies writing Dockerfiles using already packaged software. With VIC-1.5 it is possible to use alternate Linux kernels (e.g. from CentOS), which makes this feature even more interesting since it would enable users to run e.g. the full CentOS stack in Container VMs. This is the reason for this feature request, which has beed discussed in the past, but never got implemented.

Details The CentOS systemd Container should run with VIC. The corresponding Dockerfile looks like this

FROM centos/systemd

MAINTAINER "Your Name" 
RUN yum -y install httpd; yum clean all; systemctl enable httpd.service
EXPOSE 80

CMD ["/usr/sbin/init"]

and shows how you can avoid re-inventing the wheel. It would also allow to replace OVA deployments completely with VIC based Container VMs. The command

docker run --privileged --name httpd -v /sys/fs/cgroup:/sys/fs/cgroup:ro -p 80:80 -d httpd

should be replaceable with the following command in a VIC context

docker run --name httpd -p 80:80 -d httpd

i.e. you do not need privileged mode or mount hacks.

hickeng commented 5 years ago

Adding some background, and hints for how to add this.

Prior to the custom ISO work we did use systemd to initialize /dev and then switchroot into the container filesystem with tether as pid1. With the custom ISO work we also had to support sysv init systems (no systemd present) so we now have the system-init script.

I cannot give a solid estimate for supporting a systemd based container because we’ve never gone through it in depth, but:

  1. Use systemd for the system init. This may be as simple as a custom ISO configuration that calls “exec /lib/systemd-systemd -system” once any cVM specific init has occurred.
  2. Launch tether, not as PID 1. a. This could be after starting systemd or via a systemd unit b. Tether unit tests do not run as PID1 so this should be viable without alteration c. May need to confirm that child exit codes are reaped correctly – this may require Linux 3.4 and up for the PR_SET_CHILD_SUBREAPER support (see lib/tether/tether_linux.go)
  3. If the container directly runs systemd then that may need to be replaced with “systemd-systemd -user” (don’t recall the exact argument name) if that's not automatically detected.

Things to consider:

  1. Do you start systemd before or after the switch_root to the container filesystem? a. systemd makes use of dbus so I highly recommend using the systemd mechanisms to do the switchroot as that should ensure systemd function moves over smoothly.
  2. What systemd unit files need to be present in the container image for systemd to function correctly. It may be necessary to copy parts of /etc/systemd, /run/systemd, and /usr/lib/systemd into /.tether and then bindmount them into the container rootfs. This is where the speculation really starts as I've never experimented with this part.

I do think this would be extremely useful work, and is a necessary pre-req to supporting kubelet running in a cVM if you want to be able to support Kubernetes-cluster-in-a-VCH, which I think is also extremely useful work.

tschwaller commented 5 years ago

thanks for input. Will also talk to a few customers about it.