kata-containers / runtime

Kata Containers version 1.x runtime (for version 2.x see https://github.com/kata-containers/kata-containers).
https://katacontainers.io/
Apache License 2.0
2.1k stars 376 forks source link

Containerd block snapshotter support #1325

Closed mcastelino closed 5 years ago

mcastelino commented 5 years ago

Kata does not support block based containerd snapshotters

Kata does not support block based containerd snapshotters. This is due to an incorrect assumption in Kata that there will be a directory called "rootfs" within the block device that holds the snapshot.

However with containerd block based snapshotters this no longer true.

e..g In the past the built in devicemapper would report the rootfs at

/var/lib/docker/devicemapper/mnt/999042f4b5dea95ed2b91e58ca5dbd865fe064f8a17cca5415489930f1a34df5/rootfs

The backing device is at

/var/lib/docker/devicemapper/mnt/999042f4b5dea95ed2b91e58ca5dbd865fe064f8a17cca5415489930f1a34df5/

and the rootfs is a directory under the backing device.

The Kata logic always assumed that this was the case.

However with the current container snapshotter implementation the rootfs is reported as follows

/dev/mapper/vgthin-47 on /run/containerd/io.containerd.runtime.v2.task/default/hello-kata-lvm/rootfs type xfs

the rootfs directory is the backing device.

i.e. the volume/block device is mounted directly at rootfs which breaks Kata logic

Kata Logic

Kata today has multiple paths and options in handling the rootfs

  1. Hypervisor does not support block device hotplug

    Kata bind mounts the rootfs location to the shared directory at location c.id/rootfs a. For overlay based snapshotters b. For block based snapshotters

  2. Hypervisor supports block device hotplug a. overlay based snapshotters Kata bind mounts the rootfs location to the shared directory at location c.id/rootfs b. block based graph drivers (e.g devicemapper) where the rootfs is a directory within the device c. block based snapshotters (e.g firecracker devicemapper, kata lvm) where the rootfs is the device

  3. Hypervisor supports block device hotplug but user chooses to not use block hotplug Kata bind mounts the rootfs location to the shared directory at location c.id/rootfs a. For overlay based snapshotters b. For block based snapshotters

We need to handle all of these cases

Fix

diff --git a/virtcontainers/container.go b/virtcontainers/container.go
index 5408d85..f11fe99 100644
--- a/virtcontainers/container.go
+++ b/virtcontainers/container.go
@@ -276,6 +276,7 @@ type Container struct {
        runPath       string
        configPath    string
        containerPath string
+       rootfsSuffix  string

        state types.State

@@ -640,6 +641,7 @@ func newContainer(sandbox *Sandbox, contConfig ContainerConfig) (*Container, err
                runPath:       store.ContainerRuntimeRootPath(sandbox.id, contConfig.ID),
                configPath:    store.ContainerConfigurationRootPath(sandbox.id, contConfig.ID),
                containerPath: filepath.Join(sandbox.id, contConfig.ID),
+               rootfsSuffix:  "rootfs",
                state:         types.State{},
                process:       Process{},
                mounts:        contConfig.Mounts,
@@ -1131,6 +1133,10 @@ func (c *Container) hotplugDrive() error {
                return nil
        }

+       if dev.mountPoint == c.rootFs {
+               c.rootfsSuffix = ""
+       }
+
        // If device mapper device, then fetch the full path of the device
        devicePath, fsType, err := getDevicePathAndFsType(dev.mountPoint)
        if err != nil {

Test Matrix

  1. Kata with OCI interface with overlay (crio and containerd)
  2. Kata with OCI interface with block based graphdriver with containerd
  3. Kata with OCI interface with snapshotter with containerd
  4. Kata with containerd-shim-v2 interface with overlay (crio and containerd)
  5. Kata with containerd-shim-v2 interface with snapshotter (crio and containerd

with Kata configured to A. Disable block device hotplug B. Enable block device hotplug

# Disable block device from being used for a container's rootfs.
# In case of a storage driver like devicemapper where a container's
# root file system is backed by a block device, the block device is passed
# directly to the hypervisor for performance reasons.
# This flag prevents the block device from being passed to the hypervisor,
# 9pfs is used instead to pass the rootfs.
# disable_block_device_use = true
disable_block_device_use = false
mcastelino commented 5 years ago

We need this fix to support https://github.com/kata-containers/runtime/issues/1303

mcastelino commented 5 years ago

This should also fix https://github.com/kata-containers/runtime/issues/1248

sboeuf commented 5 years ago

@mcastelino we definitely need to solve this. Is the fix you're showing here the solution to both problems (with and without rootfs extension)? Or do we need to find a generic fix?

mcastelino commented 5 years ago

/cc @ganeshmaharaj @egernst @amshinde @sboeuf @bergwolf

With this change we can support all variants of graph drivers and snapshotters.

We should also make one more change longer term

https://github.com/kata-containers/runtime/blob/master/containerd-shim-v2/service.go#L327

This always mounts the block device on the host even though in the case of Kata this is no longer desired. In the case of block based snapshotters we should skip mounting on the hostside. This will be safer even though we may break some docker functionality.

sboeuf commented 5 years ago

@mcastelino yes we should check the type of device that needs to be mounted, and if it's a block device, we should ignore this. /cc @lifupan

mcastelino commented 5 years ago

@mcastelino we definitely need to solve this. Is the fix you're showing here the solution to both problems (with and without rootfs extension)? Or do we need to find a generic fix?

@sboeuf this is a completely generic fix and should work for all use cases.

sboeuf commented 5 years ago

@mcastelino

@sboeuf this is a completely generic fix and should work for all use cases.

Why don't we have a PR submitted yet then? :smile:

mcastelino commented 5 years ago

Also fixes https://github.com/containerd/containerd/issues/2988#issuecomment-463933213

https://github.com/containerd/containerd/issues/2988#issuecomment-463507433

/cc @eryugey @clarklee92

lifupan commented 5 years ago

@mcastelino yes we should check the type of device that needs to be mounted, and if it's a block device, we should ignore this. /cc @lifupan

Hi @sboeuf @mcastelino , Yes, I agree it. But before we do that, we need to figure out how to pass this block device as a container root to virtcontainer pkg, by container spec of any other method?

sboeuf commented 5 years ago

@lifupan

But before we do that, we need to figure out how to pass this block device as a container root to virtcontainer pkg, by container spec of any other method?

Could you elaborate on the reasons for this need?

mcastelino commented 5 years ago

@lifupan

But before we do that, we need to figure out how to pass this block device as a container root to virtcontainer pkg, by container spec of any other method?

Could you elaborate on the reasons for this need?

@lifupan we already pass the rootfs as a block device today. The only optimization we need to do is not mount it on the host in the case of shim-v2