kata-containers / runtime

Kata Containers version 1.x runtime (for version 2.x see https://github.com/kata-containers/kata-containers).
https://katacontainers.io/
Apache License 2.0
2.1k stars 375 forks source link

Avoid the mounting the blockdevice on the host and resulted compromises #4

Closed laijs closed 4 years ago

laijs commented 6 years ago

It is better to avoid doing mount the block device in the host.

1) the host might not have the ability to mount it. For example, if the block device is from ceph, and the host doesn't have krbd.ko. And even the host has it, userspace ceph lib + qemu are always the best choice.

2) Security, if the block device was once mounted inside the vm, the host should not mount it. the code in the vm might hack into the guest kernel, and modify the metadata of the filesystem of the block device. The host kernel might be also broken into when mounting the block device.

3) Performance: to speed up the starting of the container, host avoids mounting the block device on the host.

Compromises: 1) access to /etc/passwd in the vm/sandbox, and string user name should be allowed in the agent API. 2) init the docker-init layer inside the vm/sandbox. 3) populate the content of the volume in vm/sandbox. (When a path which is not empty is assigned to be a volume, the content of the path should be copied to the volume when initializing the volume.)

mmgaggle commented 6 years ago

If you use virtio-blk/scsi to make the rbd device available to the guest, it also means that only the host needs to be on the Ceph public network and have Cephx authentication keys. In a multi-tenant environment, this is critical.

sboeuf commented 6 years ago

Not sure I understand the concerns here. I thought we were actually not mounting block devices, but simply passing them inside the VM and the mount was happening into the guest. What did I miss ?

bergwolf commented 6 years ago

@sboeuf I think @laijs was pointing at the device mapper handling in kata runtime. Or more generally, any block device based volumes. This is a layer higher than kata runtime, e.g., docker daemon is the one mounting the device mapper devices on the host. We should try to ask these upper layers not to mount the block devices on the host, and just pass them to the runtime directly.

It applies to @mmgaggle's concern about ceph rbd as well.

sboeuf commented 6 years ago

Okay I was not aware about that. But I still don't understand how from a Kata Containers perspective we might have some control over what you're describing ? If this has been already mounted before it's been passed to Kata, how do you expect to prevent this from happening ?

bergwolf commented 6 years ago

@sboeuf For one thing, we do not have control right now. For another thing, we can provide relevant infrastructures for it to work. The storage hotplug APIs would allow such a solution to work and frakti is already able to pass the block devices down to the runtime without mounting them on the host first.

On the kata cli side, we should work on pushing proper storage description to the OCI runtime spec and then CRI-O and containerd cri plugin can both pass the block devices down to the runtime without mounting them no the host.

This is a long term todo item and we need to push forward step by step.

sboeuf commented 6 years ago

The storage hotplug APIs would allow such a solution to work and frakti is already able to pass the block devices down to the runtime without mounting them on the host first.

Yes I understand, and I do expect Frakti to handle this, meaning the storage hotplug should never mount the block device on the host (but I don't think this has ever been into the scope of the storage hotplug).

we should work on pushing proper storage description to the OCI runtime spec

This sounds like a good proposal to me, and it would make a lot of sense to let the container runtime know what it should do with a specific volume, instead of letting him determine by himself.

bergwolf commented 6 years ago

but I don't think this has ever been into the scope of the storage hotplug

The kata runtime never mounts it, so it's out of the scope of kata APIs. OTOH, the storage hotplug API also exposes the storage descriptors, making it possible for frakti to pass raw block device as source of rootfs/volume to kata runtime.

sboeuf commented 6 years ago

So, do you mean that the storage API will provide a way to mount vs not-mount a block device passed as source, depending on some flags passed through the storage descriptors ? I am fine with every case, but I always thought there was no interest in mounting a block device on the host anyway.

But we gotta capture the proposal about providing a type regarding the storage that is passed by OCI, before it gets lost through the comments on this issue ;)

bergwolf commented 6 years ago

do you mean that the storage API will provide a way to mount vs not-mount a block device passed as source, depending on some flags passed through the storage descriptors ?

There is no difference at the kata API level. mounted hostpath is just one storage type for kata runtime, which will translate into 9pfs shared to the guest.

But we gotta capture the proposal about providing a type regarding the storage that is passed by OCI, before it gets lost through the comments on this issue ;)

Indeed! Maybe create a separate ticket for it so that it cannot be buried by other topics?

sboeuf commented 6 years ago

Yeah we need a specific issue saying that we want to extend OCI. And this might not be easy, but worth the shot ;)

jcvenegas commented 5 years ago

Is this something it could be driven now with shim-v2 or even in at that level we have no powers to do that ? If not lets close it

jodh-intel commented 4 years ago

Closing as this is a very old and nobody appears to be working on it.