RunCVM (Run Container Virtual Machine) is an experimental open-source Docker container runtime for Linux, created by Struan Bartlett at NewsNow Labs, that makes launching standard containerised workloads and system workloads (e.g. Systemd, Docker, even OpenWrt) in VMs as easy as launching a container.
Install:
curl -s -o - https://raw.githubusercontent.com/newsnowlabs/runcvm/main/runcvm-scripts/runcvm-install-runtime.sh | sudo sh
Now launch an nginx VM listening on port 8080:
docker run --runtime=runcvm --name nginx1 --rm -p 8080:80 nginx
Launch a MariaDB VM, with 2 cpus and 2G memory, listening on port 3306:
docker run --runtime=runcvm --name mariadb1 --rm -p 3306:3306 --cpus 2 --memory 2G --env=MARIADB_ALLOW_EMPTY_ROOT_PASSWORD=1 mariadb
Launch a vanilla ubuntu VM, with interactive terminal:
docker run --runtime=runcvm --name ubuntu1 --rm -it ubuntu
Gain another interactive console on ubuntu1
:
docker exec -it ubuntu1 bash
Launch a VM with 1G memory and a 1G ext4-formatted backing file mounted at /var/lib/docker
and stored in the underlying container's filesystem:
docker run -it --runtime=runcvm --memory=1G --env=RUNCVM_DISKS=/disks/docker,/var/lib/docker,ext4,1G <docker-image>
Launch a VM with 2G memory and a 5G ext4-formatted backing file mounted at /var/lib/docker
and stored in a dedicated Docker volume on the host:
docker run -it --runtime=runcvm --memory=2G --mount=type=volume,src=runcvm-disks,dst=/disks --env=RUNCVM_DISKS=/disks/docker,/var/lib/docker,ext4,5G <docker-image>
Launch a 3-node Docker Swarm on a network with 9000 MTU and, on the swarm, an http global service:
git clone https://github.com/newsnowlabs/runcvm.git && \
cd runcvm/tests/00-http-docker-swarm && \
NODES=3 MTU=9000 ./test
Docker+Sysbox runtime demo - Launch Ubuntu running Systemd and Docker with Sysbox runtime; then within it run an Alpine Sysbox container; and, within that install dockerd and run a container from the 'hello-world' image:
cat <<EOF | docker build --tag=ubuntu-docker-sysbox -
FROM ubuntu:jammy
RUN apt update && apt -y install apt-utils kmod wget iproute2 systemd \
ca-certificates curl gnupg udev dbus && \
curl -fsSL https://get.docker.com | bash
RUN wget -O /tmp/sysbox.deb \
https://downloads.nestybox.com/sysbox/releases/v0.6.2/sysbox-ce_0.6.2-0.linux_amd64.deb && \
apt -y install /tmp/sysbox.deb
ENTRYPOINT ["/lib/systemd/systemd"]
ENV RUNCVM_DISKS='/disks/docker,/var/lib/docker,ext4,1G;/disks/sysbox,/var/lib/sysbox,ext4,1G'
VOLUME /disks
EOF
docker run -d --runtime=runcvm -m 2g --name=ubuntu-docker-sysbox ubuntu-docker-sysbox
docker exec ubuntu-docker-sysbox bash -c "docker run --rm --runtime=sysbox-runc alpine ash -x -c 'apk add docker; dockerd &>/dev/null & sleep 5; docker run --rm hello-world'"
docker rm -fv ubuntu-docker-sysbox
Nested RunCVM demo - Launch Ubuntu running Systemd and Docker with RunCVM runtime installed; then within it run an Alpine RunCVM Container/VM; and, within that install dockerd and, within that, run a container from the 'hello-world' image:
cat <<EOF | docker build --tag=ubuntu-docker-runcvm -
FROM ubuntu:jammy
RUN apt update && apt -y install apt-utils kmod wget iproute2 systemd \
ca-certificates curl gnupg udev dbus && \
curl -fsSL https://get.docker.com | bash
COPY --from=newsnowlabs/runcvm:latest /opt /opt/
RUN rm -f /etc/init.d/docker && \
bash /opt/runcvm/scripts/runcvm-install-runtime.sh --no-dockerd && \
echo kvm_intel >>/etc/modules
ENTRYPOINT ["/lib/systemd/systemd"]
ENV RUNCVM_DISKS='/disks/docker,/var/lib/docker,ext4,1G'
VOLUME /disks
EOF
docker run -d --runtime=runcvm -m 2g --name=ubuntu-docker-runcvm ubuntu-docker-runcvm
docker exec ubuntu-docker-runcvm bash -c "docker run --rm --runtime=runcvm alpine ash -x -c 'apk add docker; dockerd &>/dev/null & sleep 5; docker run --rm hello-world'"
docker rm -fv ubuntu-docker-runcvm
Docker+GVisor runtime demo - Launch Ubuntu running Systemd and Docker with GVisor runtime; then within it run the 'hello-world' image in a GVisor container:
cat <<EOF | docker build --tag=ubuntu-docker-gvisor -
FROM ubuntu:jammy
RUN apt update && apt -y install apt-utils kmod wget iproute2 systemd \
ca-certificates curl gnupg udev dbus jq && \
curl -fsSL https://get.docker.com | bash
RUN curl -fsSL https://gvisor.dev/archive.key | gpg --dearmor -o /usr/share/keyrings/gvisor-archive-keyring.gpg
RUN echo "deb [arch=amd64 signed-by=/usr/share/keyrings/gvisor-archive-keyring.gpg] https://storage.googleapis.com/gvisor/releases release main" >/etc/apt/sources.list.d/gvisor.list && \
apt update && \
apt-get install -y runsc
RUN [ ! -f /etc/docker/daemon.json ] && echo '{}' > /etc/docker/daemon.json; cat /etc/docker/daemon.json | jq '.runtimes.runsc.path="/usr/bin/runsc"' | tee /etc/docker/daemon.json
ENTRYPOINT ["/lib/systemd/systemd"]
ENV RUNCVM_DISKS='/disks/docker,/var/lib/docker,ext4,1G'
VOLUME /disks
EOF
docker run -d --runtime=runsc -m 2g --name=ubuntu-docker-gvisor ubuntu-docker-gvisor
docker exec ubuntu-docker-gvisor bash -c "docker run --rm --runtime=runsc hello-world"
docker rm -fv ubuntu-docker-gvisor
Launch OpenWrt - with port forward to LuCI web UI on port 10080:
docker import --change='ENTRYPOINT ["/sbin/init"]' https://archive.openwrt.org/releases/23.05.2/targets/x86/generic/openwrt-23.05.2-x86-generic-rootfs.tar.gz openwrt-23.05.2 && \
docker network create --subnet 172.128.0.0/24 runcvm-openwrt && \
echo -e "config interface 'loopback'\n\toption device 'lo'\n\toption proto 'static'\n\toption ipaddr '127.0.0.1'\n\toption netmask '255.0.0.0'\n\nconfig device\n\toption name 'br-lan'\n\toption type 'bridge'\n\tlist ports 'eth0'\n\nconfig interface 'lan'\n\toption device 'br-lan'\n\toption proto 'static'\n\toption ipaddr '172.128.0.5'\n\toption netmask '255.255.255.0'\n\toption gateway '172.128.0.1'\n" >/tmp/runcvm-openwrt-network && \
docker run -it --rm --runtime=runcvm --name=openwrt --network=runcvm-openwrt --ip=172.128.0.5 -v /tmp/runcvm-openwrt-network:/etc/config/network -p 10080:80 openwrt-23.05.2
RunCVM was born out of difficulties experienced using the Docker and Podman CLIs to launch Kata Containers v2, and a belief that launching containerised workloads in VMs using Docker needn't be so complicated.
Motivations included: efforts to re-add OCI CLI commands for docker/podman to Kata v2 to support Docker & Podman; other Kata issues #3358, #1123, #1133, #3038; #5321; #6861; Podman issues #8579 and #17070; and Kubernetes issue #40114; though please note, since authoring RunCVM some of these issues may have been resolved.
Like Kata, RunCVM aims to be a secure container runtime with lightweight virtual machines that feel and perform like containers, but provide stronger workload isolation using hardware virtualisation technology.
However, while Kata aims to launch standard container images inside a restricted-privileges namespace inside a VM running a single fixed and heavily customised kernel and Linux distribution optimised for this purpose, RunCVM intentionally aims to launch container or VM images as the VM's root filesystem using stock or bespoke Linux kernels, the upshot being RunCVM's can run VM workloads that Kata's security and kernel model would explicitly prevent.
For example:
RunCVM features:
docker run
(with experimental support for podman run
).runc
to cause a VM to be launched within the container (making its code footprint and external dependencies extremely small, and its internals extremely simple and easy to understand and tailor for specific purposes).RunCVM makes some trade-offs in return for this simplicity. See the full list of features and limitations.
RunCVM is free and open-source, licensed under the Apache Licence, Version 2.0. See the LICENSE file for details.
docker run
with no need to customise images or the command line (except adding --runtime=runcvm
)dockerd
and systemd
that will not run in standard container runtimesdocker run -it
, docker start -ai
and docker attach
(and so on), generally good support for other docker container
subcommandskvm
and tun
)docker run --network=<network>
and docker network connect
(excluding IPv6)docker run -it
The main applications for RunCVM are:
systemd
, dockerd
, Docker swarm services, Kubernetes)RunCVM's 'wrapper' runtime, runcvm-runtime
, receives container create commands triggered by docker
run
/create
commands, modifies the configuration of the requested container in such a way that the created container will launch a VM that boots from the container's filesystem, and then passes the request on to the standard container runtime (runc
) to actually create and start the container.
For a deep dive into RunCVM's internals, see the section on Developing RunCVM.
RunCVM should run on any amd64 (x86_64) hardware (or VM) running Linux Kernel >= 5.10, and that supports KVM and Docker. So if your host can already run KVM VMs and Docker then it should run RunCVM.
RunCVM has no other host dependencies, apart from Docker (or experimentally, Podman) and the kvm
and tun
kernel modules. RunCVM comes packaged with all binaries and libraries it needs to run (including its own QEMU binary).
RunCVM is tested on Debian Bullseye and GitHub Codespaces.
For RunCVM to support Docker DNS within Container/VMs, the following condition on /proc/sys/net/ipv4/conf/
must be met:
all/rp_filter
and <bridge>/rp_filter
should be 0 ('No Source Validation') or 2 (Loose mode as defined in RFC3704 Loose Reverse Path)
(where <bridge>
is any bridge underpinning a Docker network to which RunCVM Container/VMs will be attached)This means that:
all/rp_filter
will be set to 0, then <bridge>/rp_filter
must be set to 0 or 2
(or, if <bridge>
is not yet or might not yet have been created, then default/rp_filter
must be set to 0 or 2)all/rp_filter
will be set to 1, then <bridge>/rp_filter
must be set to 2
(or, if <bridge>
is not yet or might not yet have been created, then default/rp_filter
must be set to 2)all/rp_filter
will be set to 2, then no further action is neededAt time of writing:
0
;2
;1
and rp_filter
settings in /etc/sysctl.d/60-gce-network-security.conf
must be modified or overridden to support RunCVM.We recommend all/rp_filter
be set to 2, as this is the simplest change and provides a good balance of security.
Run:
curl -s -o - https://raw.githubusercontent.com/newsnowlabs/runcvm/main/runcvm-scripts/runcvm-install-runtime.sh | sudo sh
This will:
/opt/runcvm
(installation elsewhere is currently unsupported)/etc/docker/daemon.json
to add runcvm
to the runtimes
propertydockerd
, if it can be detected how for your system (e.g. systemctl restart docker
)docker info
/etc/containers/containers.conf
rp_filter
settings, and amend them if necessaryFollowing installation, launch a basic test RunCVM Container/VM:
docker run --runtime=runcvm --rm -it hello-world
Create an image that will allow instances to have VMX capability:
gcloud compute images create debian-12-vmx --source-image-project=debian-cloud --source-image-family=debian-12 --licenses="https://compute.googleapis.com/compute/v1/projects/vm-options/global/licenses/enable-vmx"
Now launch a VM, install Docker and RunCVM:
cat >/tmp/startup-script.sh <<EOF
#!/bin/bash
apt update && apt -y install apt-utils kmod wget iproute2 systemd \
ca-certificates curl gnupg udev dbus jq && \
mkdir -p /etc/docker && echo '{"userland-proxy": false}' >/etc/docker/daemon.json && \
curl -fsSL https://get.docker.com | bash && \
curl -s -o - https://raw.githubusercontent.com/newsnowlabs/runcvm/main/runcvm-scripts/runcvm-install-runtime.sh | sudo REPO=newsnowlabs/runcvm:latest sh
EOF
gcloud compute instances create runcvm-vmx-test --zone=us-central1-a --machine-type=n2-highmem-2 --network-interface=network-tier=PREMIUM,stack-type=IPV4_ONLY,subnet=default --metadata-from-file=startup-script=/tmp/startup-script.sh --no-restart-on-failure --maintenance-policy=TERMINATE --provisioning-model=SPOT --instance-termination-action=STOP --no-service-account --no-scopes --create-disk=auto-delete=yes,boot=yes,image=debian-12-vmx,mode=rw,size=50,type=pd-ssd --no-shielded-secure-boot --shielded-vtpm --shielded-integrity-monitoring --labels=goog-ec-src=vm_add-gcloud --reservation-affinity=any
To upgrade, follow this procedure:
/opt/runcvm/scripts/runcvm-install-runtime.sh
(or rerun the installation command - which runs the same script)In the below summary of RunCVM's current main features and limitations, [+] is used to indicate an area of compatibility with standard container runtimes and [-] is used indicate a feature of standard container runtimes that is unsupported.
N.B.
docker run
anddocker exec
options not listed below are unsupported and their effect, if used, is unspecified.
docker run
--mount
(or -v
) is supported for volume mounts, tmpfs mounts, and host file and directory bind-mounts (the dst
mount path /disks
is reserved)--device
is unsupported--network
are supported, including Docker DNS resolution of container names and respect for custom network MTUdocker run --network
or docker network connect
(but only to a created and not yet running container) - are supported (including scope=overlay
networks and those with multiple subnets)--publish
(or -p
) is supported--dns
, --dns-option
, --dns-search
are supported--ip
is supported--hostname
(or -h
) is supporteddocker network connect
on a running container is not supported--network=host
and --network=container:name|id
are not supported--user
(or -u
) is supported--workdir
(or -w
) is supported--env
(or -e
), --env-file
is supported--entrypoint
is supported--init
- is supported (but runs RunCVM's own VM init process rather than Docker's default, tini
)--detach
(or -d
) is supported--interactive
(or -i
) is supported--tty
(or -t
) is supported (but to enter CTRL-T one must press CTRL-T twice)--attach
(or -a
) is supportedrunc
containerdocker run --runtime=runcvm debian bash -c 'echo stdout; echo stderr >&2' >/tmp/stdout 2>/tmp/stderr
does not produce the expected result--cpus
is supported to specify number of VM CPUs--memory
(or -m
) is supported to specify VM memory--cpu-*
), block IO (--blkio-*
), kernel memory (--kernel-memory
) are unsupported or untested/.runcvm/exit-code
(supported exit codes 0-255) or call /opt/runcvm/sbin/qemu-exit <code>
(supported exit codes 0-127). Automatic handling of exit codes from the entrypoint will be provided in a later version.dockerd
or to improve disk performancedockerd
mileage will vary unless a volume or disk is mounted over /var/lib/docker
docker exec
--user
(or -u
), --workdir
(or -w
), --env
(or -e
), --env-file
, --detach
(or -d
), --interactive
(or -i
) and --tty
(or -t
) are all supporteddocker exec <container> bash -c 'echo stdout; echo stderr >&2' >/tmp/stdout 2>/tmp/stderr
does produce the expected result/opt/runcvm
is mounted read-only within RunCVM containers. Container applications cannot compromise RunCVM, but they can execute binaries from within the RunCVM package. The set of binaries available to the VM may be reduced to a minimum in a later version./etc/os-release
within the image being launched.This table provides a high-level comparison of RunCVM and Kata across various features like kernels, networking/DNS, memory allocation, namespace handling, method of operation, and performance characteristics:
Feature | RunCVM | Kata |
---|---|---|
Methodology | Boots VM from distribution kernels with container's filesystem directly mounted as root filesystem, using virtiofs. VM setup code and kernel modules are bind-mounted into the container. VM's PID1 runs setup code to reproduce the container's networking environment within the VM before executing the container's original entrypoint. | Boots VM from custom kernel with custom root disk image, mounts the virtiofsd-shared host container filesystem to a target folder and executes the container's entrypoint within a restricted namespace having chrooted to that folder. |
Privileges/restrictions | Container code has full root access to VM and its devices. It may run anything that runs in a VM, mounting filesystems, installing kernel modules, accessing devices. RunCVM helper processes are visible to ps etc. |
Runs container code inside a VM namespace with restricted privileges. Use of mounts, kernel modules is restricted. Kata helper processes (like kata-agent and chronyd) are invisible to ps . |
Kernels | Launches stock Alpine, Debian, Ubuntu kernels. Kernel /lib/modules automatically mounted within VM. Install any needed modules without host reconfiguration. |
Launches custom kernels. Kernel modules aren't mounted and need host reconfiguration to be installed. |
Networking/DNS | Docker container networking + internal/external DNS out-of-the-box. No support for docker network connect/disconnect |
DNS issues presented: with custom network, external ping works, but DNS lookups fail both for internal docker hosts and external hosts.[^1] |
Memory | VM assigned and reports total memory as per --memory <mem> |
VM total memory reported by free appears unrelated to --memory <mem> specified [^2] |
CPUs | VM assigned and reports CPUs as per --cpus <cpus> |
CPUs must be hardcoded in Kata host config |
Performance | Custom kernel optimisations may deliver improved startup (~3.2s) or operational performance (~15%) | |
virtiofsd | Runs virtiofsd in container namespace |
Unknown |
[^1]: docker network create --scope=local testnet >/dev/null && docker run --name=test --rm --runtime=kata --network=testnet --entrypoint=/bin/ash alpine -c 'for n in test google.com 8.8.8.8; do echo "ping $n ..."; ping -q -c 8 -i 0.5 $n; done'; docker network rm testnet >/dev/null
succeeds on runc
and runcvm
but at time of writing (2023-12-31) the DNS lookups needed fail on kata
.
$ docker network create --scope=local testnet >/dev/null && docker run --name=test --rm -it --runtime=kata --network=testnet --entrypoint=/bin/ash alpine -c 'for n in test google.com 8.8.8.8; do echo "ping $n ..."; ping -q -c 8 -i 0.5 $n; done'; docker network rm testnet >/dev/null
ping test ...
ping: bad address 'test'
ping google.com ...
ping: bad address 'google.com'
ping 8.8.8.8 ...
PING 8.8.8.8 (8.8.8.8): 56 data bytes
--- 8.8.8.8 ping statistics ---
8 packets transmitted, 8 packets received, 0% packet loss
round-trip min/avg/max = 0.911/1.716/3.123 ms
$ docker network create --scope=local testnet >/dev/null && docker run --name=test --rm -it --runtime=runcvm --network=testnet --entrypoint=/bin/ash alpine -c 'for n in test google.com 8.8.8.8; do echo "ping $n ..."; ping -q -c 8 -i 0.5 $n; done'; docker network rm testnet >/dev/null
ping test ...
PING test (172.25.8.2): 56 data bytes
--- test ping statistics ---
8 packets transmitted, 8 packets received, 0% packet loss
round-trip min/avg/max = 0.033/0.085/0.137 ms
ping google.com ...
PING google.com (172.217.16.238): 56 data bytes
--- google.com ping statistics ---
8 packets transmitted, 8 packets received, 0% packet loss
round-trip min/avg/max = 8.221/8.398/9.017 ms
ping 8.8.8.8 ...
PING 8.8.8.8 (8.8.8.8): 56 data bytes
--- 8.8.8.8 ping statistics ---
8 packets transmitted, 8 packets received, 0% packet loss
round-trip min/avg/max = 1.074/1.491/1.801 ms
[^2]: docker run --rm -it --runtime=kata --entrypoint=/bin/ash -m 500m alpine -c 'free -h; df -h /dev/shm'
$ docker run --rm --runtime=kata --name=test -m 2g --env=RUNCVM_KERNEL_DEBUG=1 -it alpine ash -c 'free -h'
total used free shared buff/cache available
Mem: 3.9G 94.4M 3.8G 0 3.7M 3.8G
Swap: 0 0 0
$ docker run --rm --runtime=kata --name=test -m 3g --env=RUNCVM_KERNEL_DEBUG=1 -it alpine ash -c 'free -h'
total used free shared buff/cache available
Mem: 4.9G 107.0M 4.8G 0 3.9M 4.8G
Swap: 0 0 0
$ docker run --rm --runtime=kata --name=test -m 0g --env=RUNCVM_KERNEL_DEBUG=1 -it alpine ash -c 'free -h'
total used free shared buff/cache available
Mem: 1.9G 58.8M 1.9G 0 3.4M 1.9G
Swap: 0 0 0
When creating a container, RunCVM will examine the image being launched to try to determine a suitable kernel to boot the VM with. Its process is as follows:
--env=RUNCVM_KERNEL=<dist>[/<version>]
specified, use the indicated kernel/etc/os-release
/vmlinuz
and /initrd.img
/boot/vmlinuz
and /boot/initrd.img
/boot/vmlinuz-virt
/boot/initramfs-virt
RunCVM options are specified either via standard docker run
options or via --env=<RUNCVM_KEY>=<VALUE>
options on the docker run
command line. The following env options are user-configurable:
--env=RUNCVM_KERNEL=<dist>[/<version>]
Specify with which RunCVM kernel (from /opt/runcvm/kernels
) to boot the VM. Values must be of the form <dist>/<version>
, where <dist>
is a directory under /opt/runcvm/kernels
and <version>
is a subdirectory (or symlink to a subdirectory) under that. If <version>
is omitted, latest
will be assumed. Here is an example command that will list available values of <dist>/<version>
on your installation.
$ find /opt/runcvm/kernels/ -maxdepth 2 | sed 's!^/opt/runcvm/kernels/!!; /^$/d'
debian
debian/latest
debian/5.10.0-16-amd64
alpine
alpine/latest
alpine/5.15.59-0-virt
ubuntu
ubuntu/latest
ubuntu/5.15.0-43-generic
ol
ol/5.14.0-70.22.1.0.1.el9_0.x86_64
ol/latest
Example:
docker run --rm --runtime=runcvm --env=RUNCVM_KERNEL=ol hello-world
--env=RUNCVM_KERNEL_APPEND=1
Any custom kernel command line options e.g. apparmor=0
or systemd.unified_cgroup_hierarchy=0
.
--env='RUNCVM_DISKS=<disk1>[;<disk2>;...]'
Automatically create, format, prepopulate and mount backing files as virtual disks on the VM.
Each <diskN>
should be a comma-separated list of values of the form: <src>,<dst>,<filesystem>[,<size>]
.
<src>
is the path within the container where the virtual disk backing file should be located. This may be in the container's overlayfs or within a volume (mounted using --mount=type=volume
).<dst>
is both (a) the path within the VM where the virtual disk should be mounted; and (b) the location of the directory with which contents the disk should be prepopulated.<filesystem>
is the filesystem with which the backing disk should be formatted when first created.<size>
is the size of the backing file (in truncate
format), and must be specified if <src>
does not exist.When first created, the backing file will be created as a sparse file to the specified <size>
and formatted with the specified <filesystem>
using mke2fs
and prepopulated with any files preexisting at <dst>
.
When RunCVM creates a Container/VM, fstab entries will be drafted. After the VM boots, the fstab entries will be mounted. Typically, the first disk will be mounted as /dev/vda
, the second as /dev/vdb
, and so on.
docker run -it --runtime=runcvm --env=RUNCVM_DISKS=/disk1,/home,ext4,5G <docker-image>
In this example, RunCVM will check for existence of a file at /disk1
within <docker-image>
, and if not found create a 5G backing file (in the container's filesystem, typically overlay2) with an ext4 filesystem prepopulated with any preexisting contents of /home
, then add the disk to /etc/fstab
and mount it within the VM at /home
.
docker run -it --runtime=runcvm --mount=type=volume,src=runcvm-disks,dst=/disks --env='RUNCVM_DISKS=/disks/disk1,/home,ext4,5G;/disks/disk2,/opt,ext4,2G' <docker-image>
This example behaves similarly, except that the runcvm-disks
persistent Docker volume is first mounted at /disks
within the container's filesystem, and therefore the backing files at /disks/disk1
and /disks/disk2
(mounted in the VM at /home
and /opt
respectively) are stored in the persistent volume (typically stored in /var/lib/docker
on the host, bypassing overlay2).
N.B.
/disks
and any paths below it are reserved mountpoints. Unlike other mountpoints, these are NOT mounted into the VM but only into the container, and are therefore suitable for use for mounting VM disks from bscking files that cannot be accessed within the VM's filesystem.
--env=RUNCVM_QEMU_DISPLAY=<value>
Select a specific QEMU display. Currently only curses
is supported, but others may trivially be added by customising the build.
--env=RUNCVM_SYS_ADMIN=1
By default, virtiofsd
is not launched with -o modcaps=+sys_admin
(and containers are not granted CAP_SYS_ADMIN
). Use this option if you need to change this.
--env=RUNCVM_KERNEL_MOUNT_LIB_MODULES=1
If a RunCVM kernel (as opposed to an in-image kernel) is chosen to launch a VM, by default that kernel's modules will be mounted at /lib/modules/<version>
in the VM. If this variables is set, that kernel's modules will instead be mounted over /lib/modules
.
--env=RUNCVM_KERNEL_DEBUG=1
Enable kernel logging (sets kernel console=ttyS0
).
--env=RUNCVM_BIOS_DEBUG=1
By default BIOS console output is hidden. Enable it with this option.
--env=RUNCVM_RUNTIME_DEBUG=1
Enable debug logging for the runtime (the portion of RunCVM directly invoked by docker run
, docker exec
etc).
Debug logs are written to files in /tmp
.
--env=RUNCVM_BREAK=<values>
Enable breakpoints (falling to bash shell) during the RunCVM Container/VM boot process.
<values>
must be a comma-separated list of: prenet
, postnet
, preqemu
.
--env=RUNCVM_HUGETLB=1
[EXPERIMENTAL] Enable use of preallocated hugetlb memory backend, which can improve performance in some scenarios.
--env=RUNCVM_CGROUPFS=<value>
Configures cgroupfs mountpoints in the VM, which may be needed to run applications like Docker if systemd is not running. Acceptable values are:
none
/systemd
- do nothing; leave to the application or to systemd (if running)1
/cgroup1
- mount only cgroup v1 filesystems supported by the running kernel to subdirectories of /sys/fs/cgroup
2
/cgroup2
- mount only cgroup v2 filesystem to /sys/fs/cgroup
hybrid
/mixed
- mount cgroup v1 filesystems and mount cgroup v2 filesystem to /sys/fs/cgroup/unified
Please note that if RUNCVM_CGROUPFS
is left undefined or set to an empty string, then RunCVM selects an appropriate
default behaviour according to these rules:
/systemd$
then assume a default value of none
;hybrid
.These rules work well in the cases of running Docker in (a) stock Alpine/Debian/Ubuntu distributions in which Docker has been installed but Systemd is not running; and (b) distributions in which Systemd is running. Of course you should set RUNCVM_CGROUPFS
if you need to override the default behaviour.
Please also note that in the case your distribution is running Systemd you may instead set --env=RUNCVM_KERNEL_APPEND='systemd.unified_cgroup_hierarchy=<boolean>'
(where <boolean>
is 0
or 1
) to request Systemd to create either hybrid or cgroup2-only cgroup filesystem(s) itself.
/var/lib/docker
If running Docker within a VM, it is recommended that you mount a disk backing file at /var/lib/docker
to allow dockerd
to use the preferred overlay filesystem and avoid it opting to use the extremely sub-performant vfs
storage driver.
e.g. To launch a VM with a 1G ext4-formatted backing file, stored in the underlying container's overlay filesystem, and mounted at /var/lib/docker
, run:
docker run -it --runtime=runcvm --env=RUNCVM_DISKS=/disks/docker,/var/lib/docker,ext4,1G <docker-image>
To launch a VM with a 5G ext4-formatted backing file, stored in a dedicated Docker volume on the host, and mounted at /var/lib/docker
, run:
docker run -it --runtime=runcvm --mount=type=volume,src=runcvm-disks,dst=/disks --env=RUNCVM_DISKS=/disks/docker,/var/lib/docker,ext4,5G <docker-image>
In both cases, RunCVM will check for existence of a file /disks/docker
and, if not found, will create the disk backing file of the given size and format as an ext4 filesystem. It will add the disk to /etc/fstab
.
For full documentation of RUNCVM_DISKS
, see above.
/var/lib/docker
(NOT RECOMMENDED)Doing this is not recommended, but if running Docker within a VM, you can enable dockerd
to use the overlay filesystem (at the cost of security) by launching with --env=RUNCVM_SYS_ADMIN=1
. e.g.
docker run --runtime=runcvm --mount=type=volume,src=mydocker1,dst=/var/lib/docker --env=RUNCVM_SYS_ADMIN=1 <docker-image>
N.B. This option adds
CAP_SYS_ADMIN
capabilities to the container and then launchesvirtiofsd
with-o modcaps=+sys_admin
.
The following deep dive should help explain the inner workings of RunCVM, and which files to modify to implement fixes, improvements and extensions.
RunCVM's 'wrapper' runtime, runcvm-runtime
, intercepts container create
and exec
commands and their specifications in JSON format (config.json
and process.json
respectively) that are normally provided (by docker
run
/create
and docker exec
respectively) to a standard container runtime like runc
.
The JSON file is parsed to retrieve properties of the command, and is modified to allow RunCVM to piggyback by overriding the originally intended behaviour with new behaviour.
The modifications to create
are designed to make the created container launch a VM that boots off the container's filesystem, served using virtiofsd
.
The modifications to exec
are designed to run commands within the VM instead of the container.
runcvm-runtime
- create
commandIn more detail, the RunCVM runtime create
process:
config.json
file to:
runcvm-ctr-entrypoint
to the container's original entrypoint and if an --init
argument was detected, remove any init process and set the container env var RUNCVM_INIT
to 1
RUNCVM_UIDGID
to <uid>:<gid>:<additionalGids>
as intended for the container, then resets both the <uid>
and <gid>
to 0
.RUNCVM_CPUS
to the intended --cpus
count so it can be passed to the VM/
to /vm
that will recursively mount the following preceding mounts:
/opt/runcvm
on the host to /opt/runcvm
in the container./.runcvm
/run
in the container only.<mnt>
to /vm/<mnt>
(except where <mnt>
is at or below /disks
)./vm/lib/modules/<version>
for the kernel's modulesRUNCVM_KERNEL_PATH
, RUNCVM_KERNEL_INITRAMFS_PATH
and RUNCVM_KERNEL_ROOT
/dev/kvm
and /dev/net/tun
./dev/shm
to the size desired for the VM's memory and set container env var accordingly.NET_ADMIN
, NET_RAW
, MKNOD
, AUDIT_WRITE
).--env=SYS_ADMIN=1
, add the SYS_ADMIN
capability.runc
with the modified config.json
.The runcvm-ctr-entrypoint
:
/.runcvm
.virtiofsd
to serve the container's root filesystem./etc/resolv.conf
in the container.dnsmasq
and modifies /vm/etc/resolv.conf
to proxy DNS requests from the VM to Docker's DNS.runcvm-init
init process to supervise runcvm-ctr-qemu
to launch the VM.The runcvm-init
process:
runcvm-ctr-qemu
to launch the VM.runcvm-ctr-shutdown
, which cycles through a number of methods to try to shut down the VM cleanly.runcvm-ctr-exit
to retrieve any saved exit code (written by the application to /.runcvm/exit-code
) and exit with this code.The runcvm-ctr-qemu
script:
--env=RUNCVM_DISKS=<disks>
runcvm-vm-init
as the VM's init process.The runcvm-vm-init
process:
runcvm-ctr-entrypoint
to /.runcvm
, and reproduces it within the VMRUNCVM_INIT
is 1
(i.e. the container was originally intended to be launched with Docker's own init process) then it configures and execs busybox init
, which becomes the VM's PID1, to supervise dropbear
, run runcvm-vm-start
and poweroff
the VM if signalled to do so.dropbear
, then execs (via runcvm-init
, purely to create a controlling tty) runcvm-vm-start
, which runs as the VM's PID1.The runcvm-vm-start
script:
<uid>
, <gid>
, <additionalGids>
and <cwd>
, and execs that entrypoint.runcvm-runtime
- exec
commandThe RunCVM runtime exec
process:
process.json
file to:
<uid>
, <gid>
, <additionalGids>
, <terminal>
and <cwd>
for the command, as well as <uid>
and <gid>
to 0
and the <cwd>
to /
.runcvm-ctr-exec '<uid>:<gid>:<additionalGids>' '<cwd>' '<hasHome>' '<terminal>'
to the originally intended command.runc
with the modified process.json
.The runcvm-ctr-exec
script:
dbclient
SSH client to execute the intended command, with the intended arguments within the VM, via the runcvm-vm-exec
process, propagate the returned stdout and stderr and return the command's exit code.Building RunCVM requires Docker. To build RunCVM, first clone the repo, then run the build script, as follows:
cd runcvm
./build/build.sh
The build script creates a Docker image named newsnowlabs/runcvm:latest
.
Now follow the main installation instructions to install your built RunCVM from the Docker image.
Test RunCVM using nested RunCVM. You can do this using a Docker image capable of installing RunCVM, or an image built with a version of RunCVM preinstalled.
Build a suitable image as follows:
cat <<EOF | docker build --tag=ubuntu-docker-runcvm -
FROM ubuntu:jammy
# Install needed packages and create and configure 'runcvm' user account
RUN apt update && \
apt -y install \
apt-utils kmod wget iproute2 systemd \
ca-certificates curl gnupg udev dbus sudo psmisc && \
curl -fsSL https://get.docker.com | bash && \
echo kvm_intel >>/etc/modules && \
useradd --create-home --shell /bin/bash --groups sudo,docker runcvm && \
echo runcvm:runcvm | chpasswd && \
echo 'runcvm ALL=(ALL) NOPASSWD: ALL' >/etc/sudoers.d/runcvm
WORKDIR /home/runcvm
ENTRYPOINT ["/lib/systemd/systemd"]
VOLUME /disks
# Mount formatted backing files at:
# - /var/lib/docker for speed and overlay2 support
# - /opt/runcvm to avoid nested virtiofs, which works, but can't be great for speed
ENV RUNCVM_DISKS='/disks/docker,/var/lib/docker,ext4,2G;/disks/runcvm,/opt/runcvm,ext4,2G'
# # Uncomment this block to preinstall RunCVM from the specified image
#
# COPY --from=newsnowlabs/runcvm:latest /opt /opt/
# RUN rm -f /etc/init.d/docker && \
# bash /opt/runcvm/scripts/runcvm-install-runtime.sh --no-dockerd
EOF
(Uncomment the final block to build an image with RunCVM preinstalled, or leave the block commented to test RunCVM installation).
To launch, run:
docker run -d --runtime=runcvm -m 2g --name=ubuntu-docker-runcvm ubuntu-docker-runcvm
Optionally modify this
docker run
command by:
- adding
--rm
- to automatically remove the container after systemd shutdown- removing
-d
and adding--env=RUNCVM_KERNEL_DEBUG=1
- to see kernel and systemd boot logs- removing
-d
and adding-it
- to provide a console
Then docker exec -it -u runcvm ubuntu-docker-runcvm bash
to obtain a command prompt and perform testing.
Run docker rm -fv ubuntu-docker-runcvm
to clean up after testing.
Support launching images: If you encounter any Docker image that launches in a standard container runtime that does not launch in RunCVM, or launches but with unexpected behaviour, please raise an issue titled Launch failure for image <image>
or Unexpected behaviour for image <image>
and include log excerpts and an explanation of the failure, or expected and unexpected behaviour.
For all other issues: please still raise an issue
You can also reach out to us on the NewsNow Labs Slack Workspace.
We are typically available to respond to queries Monday-Friday, 9am-5pm UK time, and will be happy to help.
If you would like to contribute a feature suggestion or code, please raise an issue or submit a pull request.
Shut down any RunCVM containers.
Then run sudo rm -f /opt/runcvm
.
RunCVM and Dockside are designed to work together in two alternative ways.
dockerd
, Docker swarm, systemd
, applications that require a running kernel, or kernel modules not available on the host, or specific hardware e.g. a graphics display. Follow the instructions for adding a runtime to your Dockside profiles.dockerd
to provide increased security and compartmentalisation from a host. e.g.docker run --rm -it --runtime=runcvm --memory=2g --name=docksidevm -p 443:443 -p 80:80 --mount=type=volume,src=dockside-data,dst=/data --mount=type=volume,src=dockside-disks,dst=/disks --env=RUNCVM_DISKS=/disks/disk1,/var/lib/docker,ext4,5G newsnowlabs/dockside --run-dockerd --ssl-builtin
This project (known as "RunCVM"), comprising the files in this Git repository (but excluding files containing a conflicting copyright notice and licence), is copyright 2023 NewsNow Publishing Limited, Struan Bartlett, and contributors.
RunCVM is an open-source project licensed under the Apache License, Version 2.0 (the "License"); you may not use RunCVM or its constituent files except in compliance with the License.
You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
N.B. In order to run, RunCVM relies upon other third-party open-source software dependencies that are separate to and independent from RunCVM and published under their own independent licences.
RunCVM Docker images made available at https://hub.docker.com/repository/docker/newsnowlabs/runcvm are distributions designed to run RunCVM that comprise: (a) the RunCVM project source and/or object code; and (b) third-party dependencies that RunCVM needs to run; and which are each distributed under the terms of their respective licences.