newsnowlabs / runcvm

RunCVM (Run Container VM) is an experimental open-source Docker container runtime, for launching standard container workloads - as well as Systemd, Docker, even OpenWrt - in VMs using 'docker run`
Apache License 2.0
213 stars 8 forks source link
docker runc virtual-machine vm

RunCVM Container Runtime

Introduction

RunCVM (Run Container Virtual Machine) is an experimental open-source Docker container runtime for Linux, created by Struan Bartlett at NewsNow Labs, that makes launching standard containerised workloads and system workloads (e.g. Systemd, Docker, even OpenWrt) in VMs as easy as launching a container.

Install RunCVM and then launch an Alpine Container/VM
View on Asciinema

Quick start

Install:

curl -s -o - https://raw.githubusercontent.com/newsnowlabs/runcvm/main/runcvm-scripts/runcvm-install-runtime.sh | sudo sh

Now launch an nginx VM listening on port 8080:

docker run --runtime=runcvm --name nginx1 --rm -p 8080:80 nginx

Launch a MariaDB VM, with 2 cpus and 2G memory, listening on port 3306:

docker run --runtime=runcvm --name mariadb1 --rm -p 3306:3306 --cpus 2 --memory 2G --env=MARIADB_ALLOW_EMPTY_ROOT_PASSWORD=1 mariadb

Launch a vanilla ubuntu VM, with interactive terminal:

docker run --runtime=runcvm --name ubuntu1 --rm -it ubuntu

Gain another interactive console on ubuntu1:

docker exec -it ubuntu1 bash

Launch a VM with 1G memory and a 1G ext4-formatted backing file mounted at /var/lib/docker and stored in the underlying container's filesystem:

docker run -it --runtime=runcvm --memory=1G --env=RUNCVM_DISKS=/disks/docker,/var/lib/docker,ext4,1G <docker-image>

Launch a VM with 2G memory and a 5G ext4-formatted backing file mounted at /var/lib/docker and stored in a dedicated Docker volume on the host:

docker run -it --runtime=runcvm --memory=2G --mount=type=volume,src=runcvm-disks,dst=/disks --env=RUNCVM_DISKS=/disks/docker,/var/lib/docker,ext4,5G <docker-image>

Launch a 3-node Docker Swarm on a network with 9000 MTU and, on the swarm, an http global service:

git clone https://github.com/newsnowlabs/runcvm.git && \
cd runcvm/tests/00-http-docker-swarm && \
NODES=3 MTU=9000 ./test

System workloads

Docker+Sysbox runtime demo - Launch Ubuntu running Systemd and Docker with Sysbox runtime; then within it run an Alpine Sysbox container; and, within that install dockerd and run a container from the 'hello-world' image:

cat <<EOF | docker build --tag=ubuntu-docker-sysbox -
FROM ubuntu:jammy
RUN apt update && apt -y install apt-utils kmod wget iproute2 systemd \
    ca-certificates curl gnupg udev dbus && \
    curl -fsSL https://get.docker.com | bash
RUN wget -O /tmp/sysbox.deb \
    https://downloads.nestybox.com/sysbox/releases/v0.6.2/sysbox-ce_0.6.2-0.linux_amd64.deb && \
    apt -y install /tmp/sysbox.deb
ENTRYPOINT ["/lib/systemd/systemd"]
ENV RUNCVM_DISKS='/disks/docker,/var/lib/docker,ext4,1G;/disks/sysbox,/var/lib/sysbox,ext4,1G'
VOLUME /disks
EOF
docker run -d --runtime=runcvm -m 2g --name=ubuntu-docker-sysbox ubuntu-docker-sysbox
docker exec ubuntu-docker-sysbox bash -c "docker run --rm --runtime=sysbox-runc alpine ash -x -c 'apk add docker; dockerd &>/dev/null & sleep 5; docker run --rm hello-world'"
docker rm -fv ubuntu-docker-sysbox

Nested RunCVM demo - Launch Ubuntu running Systemd and Docker with RunCVM runtime installed; then within it run an Alpine RunCVM Container/VM; and, within that install dockerd and, within that, run a container from the 'hello-world' image:

cat <<EOF | docker build --tag=ubuntu-docker-runcvm -
FROM ubuntu:jammy
RUN apt update && apt -y install apt-utils kmod wget iproute2 systemd \
    ca-certificates curl gnupg udev dbus && \
    curl -fsSL https://get.docker.com | bash
COPY --from=newsnowlabs/runcvm:latest /opt /opt/
RUN rm -f /etc/init.d/docker && \
    bash /opt/runcvm/scripts/runcvm-install-runtime.sh --no-dockerd && \
    echo kvm_intel >>/etc/modules
ENTRYPOINT ["/lib/systemd/systemd"]
ENV RUNCVM_DISKS='/disks/docker,/var/lib/docker,ext4,1G'
VOLUME /disks
EOF
docker run -d --runtime=runcvm -m 2g --name=ubuntu-docker-runcvm ubuntu-docker-runcvm
docker exec ubuntu-docker-runcvm bash -c "docker run --rm --runtime=runcvm alpine ash -x -c 'apk add docker; dockerd &>/dev/null & sleep 5; docker run --rm hello-world'"
docker rm -fv ubuntu-docker-runcvm

Docker+GVisor runtime demo - Launch Ubuntu running Systemd and Docker with GVisor runtime; then within it run the 'hello-world' image in a GVisor container:

cat <<EOF | docker build --tag=ubuntu-docker-gvisor -
FROM ubuntu:jammy
RUN apt update && apt -y install apt-utils kmod wget iproute2 systemd \
    ca-certificates curl gnupg udev dbus jq && \
    curl -fsSL https://get.docker.com | bash
RUN curl -fsSL https://gvisor.dev/archive.key | gpg --dearmor -o /usr/share/keyrings/gvisor-archive-keyring.gpg
RUN echo "deb [arch=amd64 signed-by=/usr/share/keyrings/gvisor-archive-keyring.gpg] https://storage.googleapis.com/gvisor/releases release main" >/etc/apt/sources.list.d/gvisor.list && \
    apt update && \
    apt-get install -y runsc
RUN [ ! -f /etc/docker/daemon.json ] && echo '{}' > /etc/docker/daemon.json; cat /etc/docker/daemon.json | jq '.runtimes.runsc.path="/usr/bin/runsc"' | tee /etc/docker/daemon.json
ENTRYPOINT ["/lib/systemd/systemd"]
ENV RUNCVM_DISKS='/disks/docker,/var/lib/docker,ext4,1G'
VOLUME /disks
EOF
docker run -d --runtime=runsc -m 2g --name=ubuntu-docker-gvisor ubuntu-docker-gvisor
docker exec ubuntu-docker-gvisor bash -c "docker run --rm --runtime=runsc hello-world"
docker rm -fv ubuntu-docker-gvisor

Launch OpenWrt - with port forward to LuCI web UI on port 10080:

docker import --change='ENTRYPOINT ["/sbin/init"]' https://archive.openwrt.org/releases/23.05.2/targets/x86/generic/openwrt-23.05.2-x86-generic-rootfs.tar.gz openwrt-23.05.2 && \
docker network create --subnet 172.128.0.0/24 runcvm-openwrt && \
echo -e "config interface 'loopback'\n\toption device 'lo'\n\toption proto 'static'\n\toption ipaddr '127.0.0.1'\n\toption netmask '255.0.0.0'\n\nconfig device\n\toption name 'br-lan'\n\toption type 'bridge'\n\tlist ports 'eth0'\n\nconfig interface 'lan'\n\toption device 'br-lan'\n\toption proto 'static'\n\toption ipaddr '172.128.0.5'\n\toption netmask '255.255.255.0'\n\toption gateway '172.128.0.1'\n" >/tmp/runcvm-openwrt-network && \
docker run -it --rm --runtime=runcvm --name=openwrt --network=runcvm-openwrt --ip=172.128.0.5 -v /tmp/runcvm-openwrt-network:/etc/config/network -p 10080:80 openwrt-23.05.2

RunCVM-in-Portainer walk-through

Playing around with RunCVM, a docker runtime plugin

Motivation

RunCVM was born out of difficulties experienced using the Docker and Podman CLIs to launch Kata Containers v2, and a belief that launching containerised workloads in VMs using Docker needn't be so complicated.

Motivations included: efforts to re-add OCI CLI commands for docker/podman to Kata v2 to support Docker & Podman; other Kata issues #3358, #1123, #1133, #3038; #5321; #6861; Podman issues #8579 and #17070; and Kubernetes issue #40114; though please note, since authoring RunCVM some of these issues may have been resolved.

Like Kata, RunCVM aims to be a secure container runtime with lightweight virtual machines that feel and perform like containers, but provide stronger workload isolation using hardware virtualisation technology.

However, while Kata aims to launch standard container images inside a restricted-privileges namespace inside a VM running a single fixed and heavily customised kernel and Linux distribution optimised for this purpose, RunCVM intentionally aims to launch container or VM images as the VM's root filesystem using stock or bespoke Linux kernels, the upshot being RunCVM's can run VM workloads that Kata's security and kernel model would explicitly prevent.

For example:

RunCVM features:

RunCVM makes some trade-offs in return for this simplicity. See the full list of features and limitations.

Contents

Licence

RunCVM is free and open-source, licensed under the Apache Licence, Version 2.0. See the LICENSE file for details.

Project aims

Project ambitions

Applications for RunCVM

The main applications for RunCVM are:

  1. Running and testing applications that:
    • don't work with (or require enhanced privileges to work with) standard container runtimes (e.g. systemd, dockerd, Docker swarm services, Kubernetes)
    • require a running kernel, or a kernel version or modules not available on the host
    • require specific hardware that can be emulated e.g. disks, graphics displays
  2. Running existing container workloads with increased security
  3. Testing container workloads that are already intended to launch in VM environments, such as on fly.io
  4. Developing any of the above applications in Dockside (see RunCVM and Dockside)

How RunCVM works

RunCVM's 'wrapper' runtime, runcvm-runtime, receives container create commands triggered by docker run/create commands, modifies the configuration of the requested container in such a way that the created container will launch a VM that boots from the container's filesystem, and then passes the request on to the standard container runtime (runc) to actually create and start the container.

For a deep dive into RunCVM's internals, see the section on Developing RunCVM.

System requirements

RunCVM should run on any amd64 (x86_64) hardware (or VM) running Linux Kernel >= 5.10, and that supports KVM and Docker. So if your host can already run KVM VMs and Docker then it should run RunCVM.

RunCVM has no other host dependencies, apart from Docker (or experimentally, Podman) and the kvm and tun kernel modules. RunCVM comes packaged with all binaries and libraries it needs to run (including its own QEMU binary).

RunCVM is tested on Debian Bullseye and GitHub Codespaces.

rp_filter sysctl settings

For RunCVM to support Docker DNS within Container/VMs, the following condition on /proc/sys/net/ipv4/conf/ must be met:

This means that:

At time of writing:

We recommend all/rp_filter be set to 2, as this is the simplest change and provides a good balance of security.

Installation

Run:

curl -s -o - https://raw.githubusercontent.com/newsnowlabs/runcvm/main/runcvm-scripts/runcvm-install-runtime.sh | sudo sh

This will:

Following installation, launch a basic test RunCVM Container/VM:

docker run --runtime=runcvm --rm -it hello-world

Install on Google Cloud

Create an image that will allow instances to have VMX capability:

gcloud compute images create debian-12-vmx --source-image-project=debian-cloud --source-image-family=debian-12   --licenses="https://compute.googleapis.com/compute/v1/projects/vm-options/global/licenses/enable-vmx"

Now launch a VM, install Docker and RunCVM:

cat >/tmp/startup-script.sh <<EOF
#!/bin/bash

apt update && apt -y install apt-utils kmod wget iproute2 systemd \
    ca-certificates curl gnupg udev dbus jq && \
    mkdir -p /etc/docker && echo '{"userland-proxy": false}' >/etc/docker/daemon.json && \
    curl -fsSL https://get.docker.com | bash && \
    curl -s -o - https://raw.githubusercontent.com/newsnowlabs/runcvm/main/runcvm-scripts/runcvm-install-runtime.sh | sudo REPO=newsnowlabs/runcvm:latest sh
EOF

gcloud compute instances create runcvm-vmx-test --zone=us-central1-a --machine-type=n2-highmem-2 --network-interface=network-tier=PREMIUM,stack-type=IPV4_ONLY,subnet=default --metadata-from-file=startup-script=/tmp/startup-script.sh --no-restart-on-failure --maintenance-policy=TERMINATE --provisioning-model=SPOT --instance-termination-action=STOP  --no-service-account --no-scopes --create-disk=auto-delete=yes,boot=yes,image=debian-12-vmx,mode=rw,size=50,type=pd-ssd --no-shielded-secure-boot --shielded-vtpm --shielded-integrity-monitoring --labels=goog-ec-src=vm_add-gcloud --reservation-affinity=any

Upgrading

To upgrade, follow this procedure:

  1. Stop all RunCVM containers.
  2. Run /opt/runcvm/scripts/runcvm-install-runtime.sh (or rerun the installation command - which runs the same script)
  3. Start any RunCVM containers.

Features and limitations

In the below summary of RunCVM's current main features and limitations, [+] is used to indicate an area of compatibility with standard container runtimes and [-] is used indicate a feature of standard container runtimes that is unsupported.

N.B. docker run and docker exec options not listed below are unsupported and their effect, if used, is unspecified.

RunCVM vs Kata comparison

This table provides a high-level comparison of RunCVM and Kata across various features like kernels, networking/DNS, memory allocation, namespace handling, method of operation, and performance characteristics:

Feature RunCVM Kata
Methodology Boots VM from distribution kernels with container's filesystem directly mounted as root filesystem, using virtiofs. VM setup code and kernel modules are bind-mounted into the container. VM's PID1 runs setup code to reproduce the container's networking environment within the VM before executing the container's original entrypoint. Boots VM from custom kernel with custom root disk image, mounts the virtiofsd-shared host container filesystem to a target folder and executes the container's entrypoint within a restricted namespace having chrooted to that folder.
Privileges/restrictions Container code has full root access to VM and its devices. It may run anything that runs in a VM, mounting filesystems, installing kernel modules, accessing devices. RunCVM helper processes are visible to ps etc. Runs container code inside a VM namespace with restricted privileges. Use of mounts, kernel modules is restricted. Kata helper processes (like kata-agent and chronyd) are invisible to ps.
Kernels Launches stock Alpine, Debian, Ubuntu kernels. Kernel /lib/modules automatically mounted within VM. Install any needed modules without host reconfiguration. Launches custom kernels. Kernel modules aren't mounted and need host reconfiguration to be installed.
Networking/DNS Docker container networking + internal/external DNS out-of-the-box. No support for docker network connect/disconnect DNS issues presented: with custom network, external ping works, but DNS lookups fail both for internal docker hosts and external hosts.[^1]
Memory VM assigned and reports total memory as per --memory <mem> VM total memory reported by free appears unrelated to --memory <mem> specified [^2]
CPUs VM assigned and reports CPUs as per --cpus <cpus> CPUs must be hardcoded in Kata host config
Performance Custom kernel optimisations may deliver improved startup (~3.2s) or operational performance (~15%)
virtiofsd Runs virtiofsd in container namespace Unknown

[^1]: docker network create --scope=local testnet >/dev/null && docker run --name=test --rm --runtime=kata --network=testnet --entrypoint=/bin/ash alpine -c 'for n in test google.com 8.8.8.8; do echo "ping $n ..."; ping -q -c 8 -i 0.5 $n; done'; docker network rm testnet >/dev/null succeeds on runc and runcvm but at time of writing (2023-12-31) the DNS lookups needed fail on kata.

    $ docker network create --scope=local testnet >/dev/null && docker run --name=test --rm -it --runtime=kata --network=testnet --entrypoint=/bin/ash alpine -c 'for n in test google.com 8.8.8.8; do echo "ping $n ..."; ping -q -c 8 -i 0.5 $n; done'; docker network rm testnet >/dev/null
    ping test ...
    ping: bad address 'test'
    ping google.com ...
    ping: bad address 'google.com'
    ping 8.8.8.8 ...
    PING 8.8.8.8 (8.8.8.8): 56 data bytes

    --- 8.8.8.8 ping statistics ---
    8 packets transmitted, 8 packets received, 0% packet loss
    round-trip min/avg/max = 0.911/1.716/3.123 ms

    $ docker network create --scope=local testnet >/dev/null && docker run --name=test --rm -it --runtime=runcvm --network=testnet --entrypoint=/bin/ash alpine -c 'for n in test google.com 8.8.8.8; do echo "ping $n ..."; ping -q -c 8 -i 0.5 $n; done'; docker network rm testnet >/dev/null
    ping test ...
    PING test (172.25.8.2): 56 data bytes

    --- test ping statistics ---
    8 packets transmitted, 8 packets received, 0% packet loss
    round-trip min/avg/max = 0.033/0.085/0.137 ms
    ping google.com ...
    PING google.com (172.217.16.238): 56 data bytes

    --- google.com ping statistics ---
    8 packets transmitted, 8 packets received, 0% packet loss
    round-trip min/avg/max = 8.221/8.398/9.017 ms
    ping 8.8.8.8 ...
    PING 8.8.8.8 (8.8.8.8): 56 data bytes

    --- 8.8.8.8 ping statistics ---
    8 packets transmitted, 8 packets received, 0% packet loss
    round-trip min/avg/max = 1.074/1.491/1.801 ms

[^2]: docker run --rm -it --runtime=kata --entrypoint=/bin/ash -m 500m alpine -c 'free -h; df -h /dev/shm'

    $ docker run --rm --runtime=kata --name=test -m 2g --env=RUNCVM_KERNEL_DEBUG=1 -it alpine ash -c 'free -h'
                total        used        free      shared  buff/cache   available
    Mem:           3.9G       94.4M        3.8G           0        3.7M        3.8G
    Swap:             0           0           0
    $ docker run --rm --runtime=kata --name=test -m 3g --env=RUNCVM_KERNEL_DEBUG=1 -it alpine ash -c 'free -h'
                total        used        free      shared  buff/cache   available
    Mem:           4.9G      107.0M        4.8G           0        3.9M        4.8G
    Swap:             0           0           0
    $ docker run --rm --runtime=kata --name=test -m 0g --env=RUNCVM_KERNEL_DEBUG=1 -it alpine ash -c 'free -h'
                total        used        free      shared  buff/cache   available
    Mem:           1.9G       58.8M        1.9G           0        3.4M        1.9G
    Swap:             0           0           0

Kernel auto-detection

When creating a container, RunCVM will examine the image being launched to try to determine a suitable kernel to boot the VM with. Its process is as follows:

  1. If --env=RUNCVM_KERNEL=<dist>[/<version>] specified, use the indicated kernel
  2. Otherwise, identify distro from /etc/os-release
    1. If one is found in the appropriate distro-specific location in the image, select an in-image kernel. The locations are:
      • Debian: /vmlinuz and /initrd.img
      • Ubuntu: /boot/vmlinuz and /boot/initrd.img
      • Alpine: /boot/vmlinuz-virt /boot/initramfs-virt
    2. Otherwise, if found in the RunCVM package, select the latest kernel compatible with the distro
    3. Finally, use the Debian kernel from the RunCVM package

Option reference

RunCVM options are specified either via standard docker run options or via --env=<RUNCVM_KEY>=<VALUE> options on the docker run command line. The following env options are user-configurable:

--env=RUNCVM_KERNEL=<dist>[/<version>]

Specify with which RunCVM kernel (from /opt/runcvm/kernels) to boot the VM. Values must be of the form <dist>/<version>, where <dist> is a directory under /opt/runcvm/kernels and <version> is a subdirectory (or symlink to a subdirectory) under that. If <version> is omitted, latest will be assumed. Here is an example command that will list available values of <dist>/<version> on your installation.

$ find /opt/runcvm/kernels/ -maxdepth 2 | sed 's!^/opt/runcvm/kernels/!!; /^$/d'
debian
debian/latest
debian/5.10.0-16-amd64
alpine
alpine/latest
alpine/5.15.59-0-virt
ubuntu
ubuntu/latest
ubuntu/5.15.0-43-generic
ol
ol/5.14.0-70.22.1.0.1.el9_0.x86_64
ol/latest

Example:

docker run --rm --runtime=runcvm --env=RUNCVM_KERNEL=ol hello-world

--env=RUNCVM_KERNEL_APPEND=1

Any custom kernel command line options e.g. apparmor=0 or systemd.unified_cgroup_hierarchy=0.

--env='RUNCVM_DISKS=<disk1>[;<disk2>;...]'

Automatically create, format, prepopulate and mount backing files as virtual disks on the VM.

Each <diskN> should be a comma-separated list of values of the form: <src>,<dst>,<filesystem>[,<size>].

When first created, the backing file will be created as a sparse file to the specified <size> and formatted with the specified <filesystem> using mke2fs and prepopulated with any files preexisting at <dst>.

When RunCVM creates a Container/VM, fstab entries will be drafted. After the VM boots, the fstab entries will be mounted. Typically, the first disk will be mounted as /dev/vda, the second as /dev/vdb, and so on.

Example #1

docker run -it --runtime=runcvm --env=RUNCVM_DISKS=/disk1,/home,ext4,5G <docker-image>

In this example, RunCVM will check for existence of a file at /disk1 within <docker-image>, and if not found create a 5G backing file (in the container's filesystem, typically overlay2) with an ext4 filesystem prepopulated with any preexisting contents of /home, then add the disk to /etc/fstab and mount it within the VM at /home.

Example #2

docker run -it --runtime=runcvm --mount=type=volume,src=runcvm-disks,dst=/disks --env='RUNCVM_DISKS=/disks/disk1,/home,ext4,5G;/disks/disk2,/opt,ext4,2G' <docker-image>

This example behaves similarly, except that the runcvm-disks persistent Docker volume is first mounted at /disks within the container's filesystem, and therefore the backing files at /disks/disk1 and /disks/disk2 (mounted in the VM at /home and /opt respectively) are stored in the persistent volume (typically stored in /var/lib/docker on the host, bypassing overlay2).

N.B. /disks and any paths below it are reserved mountpoints. Unlike other mountpoints, these are NOT mounted into the VM but only into the container, and are therefore suitable for use for mounting VM disks from bscking files that cannot be accessed within the VM's filesystem.

--env=RUNCVM_QEMU_DISPLAY=<value>

Select a specific QEMU display. Currently only curses is supported, but others may trivially be added by customising the build.

--env=RUNCVM_SYS_ADMIN=1

By default, virtiofsd is not launched with -o modcaps=+sys_admin (and containers are not granted CAP_SYS_ADMIN). Use this option if you need to change this.

--env=RUNCVM_KERNEL_MOUNT_LIB_MODULES=1

If a RunCVM kernel (as opposed to an in-image kernel) is chosen to launch a VM, by default that kernel's modules will be mounted at /lib/modules/<version> in the VM. If this variables is set, that kernel's modules will instead be mounted over /lib/modules.

--env=RUNCVM_KERNEL_DEBUG=1

Enable kernel logging (sets kernel console=ttyS0).

--env=RUNCVM_BIOS_DEBUG=1

By default BIOS console output is hidden. Enable it with this option.

--env=RUNCVM_RUNTIME_DEBUG=1

Enable debug logging for the runtime (the portion of RunCVM directly invoked by docker run, docker exec etc). Debug logs are written to files in /tmp.

--env=RUNCVM_BREAK=<values>

Enable breakpoints (falling to bash shell) during the RunCVM Container/VM boot process.

<values> must be a comma-separated list of: prenet, postnet, preqemu.

--env=RUNCVM_HUGETLB=1

[EXPERIMENTAL] Enable use of preallocated hugetlb memory backend, which can improve performance in some scenarios.

--env=RUNCVM_CGROUPFS=<value>

Configures cgroupfs mountpoints in the VM, which may be needed to run applications like Docker if systemd is not running. Acceptable values are:

Please note that if RUNCVM_CGROUPFS is left undefined or set to an empty string, then RunCVM selects an appropriate default behaviour according to these rules:

These rules work well in the cases of running Docker in (a) stock Alpine/Debian/Ubuntu distributions in which Docker has been installed but Systemd is not running; and (b) distributions in which Systemd is running. Of course you should set RUNCVM_CGROUPFS if you need to override the default behaviour.

Please also note that in the case your distribution is running Systemd you may instead set --env=RUNCVM_KERNEL_APPEND='systemd.unified_cgroup_hierarchy=<boolean>' (where <boolean> is 0 or 1) to request Systemd to create either hybrid or cgroup2-only cgroup filesystem(s) itself.

Advanced usage

Running Docker in a RunCVM Container/VM

ext4 disk backing file mounted at /var/lib/docker

If running Docker within a VM, it is recommended that you mount a disk backing file at /var/lib/docker to allow dockerd to use the preferred overlay filesystem and avoid it opting to use the extremely sub-performant vfs storage driver.

e.g. To launch a VM with a 1G ext4-formatted backing file, stored in the underlying container's overlay filesystem, and mounted at /var/lib/docker, run:

docker run -it --runtime=runcvm --env=RUNCVM_DISKS=/disks/docker,/var/lib/docker,ext4,1G <docker-image>

To launch a VM with a 5G ext4-formatted backing file, stored in a dedicated Docker volume on the host, and mounted at /var/lib/docker, run:

docker run -it --runtime=runcvm --mount=type=volume,src=runcvm-disks,dst=/disks --env=RUNCVM_DISKS=/disks/docker,/var/lib/docker,ext4,5G <docker-image>

In both cases, RunCVM will check for existence of a file /disks/docker and, if not found, will create the disk backing file of the given size and format as an ext4 filesystem. It will add the disk to /etc/fstab.

For full documentation of RUNCVM_DISKS, see above.

Docker volume mounted at /var/lib/docker (NOT RECOMMENDED)

Doing this is not recommended, but if running Docker within a VM, you can enable dockerd to use the overlay filesystem (at the cost of security) by launching with --env=RUNCVM_SYS_ADMIN=1. e.g.

docker run --runtime=runcvm --mount=type=volume,src=mydocker1,dst=/var/lib/docker --env=RUNCVM_SYS_ADMIN=1 <docker-image>

N.B. This option adds CAP_SYS_ADMIN capabilities to the container and then launches virtiofsd with -o modcaps=+sys_admin.

Developing

The following deep dive should help explain the inner workings of RunCVM, and which files to modify to implement fixes, improvements and extensions.

runcvm-runtime

RunCVM's 'wrapper' runtime, runcvm-runtime, intercepts container create and exec commands and their specifications in JSON format (config.json and process.json respectively) that are normally provided (by docker run/create and docker exec respectively) to a standard container runtime like runc.

The JSON file is parsed to retrieve properties of the command, and is modified to allow RunCVM to piggyback by overriding the originally intended behaviour with new behaviour.

The modifications to create are designed to make the created container launch a VM that boots off the container's filesystem, served using virtiofsd.

The modifications to exec are designed to run commands within the VM instead of the container.

runcvm-runtime - create command

In more detail, the RunCVM runtime create process:

The runcvm-ctr-entrypoint:

The runcvm-init process:

The runcvm-ctr-qemu script:

The runcvm-vm-init process:

The runcvm-vm-start script:

runcvm-runtime - exec command

The RunCVM runtime exec process:

The runcvm-ctr-exec script:

Building

Building RunCVM requires Docker. To build RunCVM, first clone the repo, then run the build script, as follows:

cd runcvm
./build/build.sh

The build script creates a Docker image named newsnowlabs/runcvm:latest.

Now follow the main installation instructions to install your built RunCVM from the Docker image.

Testing

Test RunCVM using nested RunCVM. You can do this using a Docker image capable of installing RunCVM, or an image built with a version of RunCVM preinstalled.

Build a suitable image as follows:

cat <<EOF | docker build --tag=ubuntu-docker-runcvm -
FROM ubuntu:jammy

# Install needed packages and create and configure 'runcvm' user account
RUN apt update && \
    apt -y install \
        apt-utils kmod wget iproute2 systemd \
        ca-certificates curl gnupg udev dbus sudo psmisc && \
    curl -fsSL https://get.docker.com | bash && \
    echo kvm_intel >>/etc/modules && \
    useradd --create-home --shell /bin/bash --groups sudo,docker runcvm && \
    echo runcvm:runcvm | chpasswd && \
    echo 'runcvm ALL=(ALL) NOPASSWD: ALL' >/etc/sudoers.d/runcvm

WORKDIR /home/runcvm
ENTRYPOINT ["/lib/systemd/systemd"]
VOLUME /disks

# Mount formatted backing files at:
# - /var/lib/docker for speed and overlay2 support
# - /opt/runcvm to avoid nested virtiofs, which works, but can't be great for speed
ENV RUNCVM_DISKS='/disks/docker,/var/lib/docker,ext4,2G;/disks/runcvm,/opt/runcvm,ext4,2G'

# # Uncomment this block to preinstall RunCVM from the specified image
#
# COPY --from=newsnowlabs/runcvm:latest /opt /opt/
# RUN rm -f /etc/init.d/docker && \
#     bash /opt/runcvm/scripts/runcvm-install-runtime.sh --no-dockerd
EOF

(Uncomment the final block to build an image with RunCVM preinstalled, or leave the block commented to test RunCVM installation).

To launch, run:

docker run -d --runtime=runcvm -m 2g --name=ubuntu-docker-runcvm ubuntu-docker-runcvm

Optionally modify this docker run command by:

  • adding --rm - to automatically remove the container after systemd shutdown
  • removing -d and adding --env=RUNCVM_KERNEL_DEBUG=1 - to see kernel and systemd boot logs
  • removing -d and adding -it - to provide a console

Then docker exec -it -u runcvm ubuntu-docker-runcvm bash to obtain a command prompt and perform testing.

Run docker rm -fv ubuntu-docker-runcvm to clean up after testing.

Support

Support launching images: If you encounter any Docker image that launches in a standard container runtime that does not launch in RunCVM, or launches but with unexpected behaviour, please raise an issue titled Launch failure for image <image> or Unexpected behaviour for image <image> and include log excerpts and an explanation of the failure, or expected and unexpected behaviour.

For all other issues: please still raise an issue

You can also reach out to us on the NewsNow Labs Slack Workspace.

We are typically available to respond to queries Monday-Friday, 9am-5pm UK time, and will be happy to help.

Contributing

If you would like to contribute a feature suggestion or code, please raise an issue or submit a pull request.

Uninstallation

Shut down any RunCVM containers.

Then run sudo rm -f /opt/runcvm.

RunCVM and Dockside

RunCVM and Dockside are designed to work together in two alternative ways.

  1. Dockside can be used to launch devtainers (development environments) in RunCVM VMs, allowing you to provision containerised online IDEs for developing applications like dockerd, Docker swarm, systemd, applications that require a running kernel, or kernel modules not available on the host, or specific hardware e.g. a graphics display. Follow the instructions for adding a runtime to your Dockside profiles.
  2. Dockside can itself be launched inside a RunCVM VM with its own dockerd to provide increased security and compartmentalisation from a host. e.g.
docker run --rm -it --runtime=runcvm  --memory=2g --name=docksidevm -p 443:443 -p 80:80 --mount=type=volume,src=dockside-data,dst=/data --mount=type=volume,src=dockside-disks,dst=/disks --env=RUNCVM_DISKS=/disks/disk1,/var/lib/docker,ext4,5G newsnowlabs/dockside --run-dockerd --ssl-builtin

Legals

This project (known as "RunCVM"), comprising the files in this Git repository (but excluding files containing a conflicting copyright notice and licence), is copyright 2023 NewsNow Publishing Limited, Struan Bartlett, and contributors.

RunCVM is an open-source project licensed under the Apache License, Version 2.0 (the "License"); you may not use RunCVM or its constituent files except in compliance with the License.

You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

N.B. In order to run, RunCVM relies upon other third-party open-source software dependencies that are separate to and independent from RunCVM and published under their own independent licences.

RunCVM Docker images made available at https://hub.docker.com/repository/docker/newsnowlabs/runcvm are distributions designed to run RunCVM that comprise: (a) the RunCVM project source and/or object code; and (b) third-party dependencies that RunCVM needs to run; and which are each distributed under the terms of their respective licences.