Closed laijs closed 4 years ago
If you use virtio-blk/scsi to make the rbd device available to the guest, it also means that only the host needs to be on the Ceph public network and have Cephx authentication keys. In a multi-tenant environment, this is critical.
Not sure I understand the concerns here. I thought we were actually not mounting block devices, but simply passing them inside the VM and the mount was happening into the guest. What did I miss ?
@sboeuf I think @laijs was pointing at the device mapper handling in kata runtime. Or more generally, any block device based volumes. This is a layer higher than kata runtime, e.g., docker daemon is the one mounting the device mapper devices on the host. We should try to ask these upper layers not to mount the block devices on the host, and just pass them to the runtime directly.
It applies to @mmgaggle's concern about ceph rbd as well.
Okay I was not aware about that. But I still don't understand how from a Kata Containers perspective we might have some control over what you're describing ? If this has been already mounted before it's been passed to Kata, how do you expect to prevent this from happening ?
@sboeuf For one thing, we do not have control right now. For another thing, we can provide relevant infrastructures for it to work. The storage hotplug APIs would allow such a solution to work and frakti is already able to pass the block devices down to the runtime without mounting them on the host first.
On the kata cli side, we should work on pushing proper storage description to the OCI runtime spec and then CRI-O and containerd cri plugin can both pass the block devices down to the runtime without mounting them no the host.
This is a long term todo item and we need to push forward step by step.
The storage hotplug APIs would allow such a solution to work and frakti is already able to pass the block devices down to the runtime without mounting them on the host first.
Yes I understand, and I do expect Frakti to handle this, meaning the storage hotplug should never mount the block device on the host (but I don't think this has ever been into the scope of the storage hotplug).
we should work on pushing proper storage description to the OCI runtime spec
This sounds like a good proposal to me, and it would make a lot of sense to let the container runtime know what it should do with a specific volume, instead of letting him determine by himself.
but I don't think this has ever been into the scope of the storage hotplug
The kata runtime never mounts it, so it's out of the scope of kata APIs. OTOH, the storage hotplug API also exposes the storage descriptors, making it possible for frakti to pass raw block device as source of rootfs/volume to kata runtime.
So, do you mean that the storage API will provide a way to mount vs not-mount a block device passed as source, depending on some flags passed through the storage descriptors ? I am fine with every case, but I always thought there was no interest in mounting a block device on the host anyway.
But we gotta capture the proposal about providing a type
regarding the storage that is passed by OCI, before it gets lost through the comments on this issue ;)
do you mean that the storage API will provide a way to mount vs not-mount a block device passed as source, depending on some flags passed through the storage descriptors ?
There is no difference at the kata API level. mounted hostpath is just one storage type for kata runtime, which will translate into 9pfs shared to the guest.
But we gotta capture the proposal about providing a type regarding the storage that is passed by OCI, before it gets lost through the comments on this issue ;)
Indeed! Maybe create a separate ticket for it so that it cannot be buried by other topics?
Yeah we need a specific issue saying that we want to extend OCI. And this might not be easy, but worth the shot ;)
Is this something it could be driven now with shim-v2 or even in at that level we have no powers to do that ? If not lets close it
Closing as this is a very old and nobody appears to be working on it.
It is better to avoid doing mount the block device in the host.
1) the host might not have the ability to mount it. For example, if the block device is from ceph, and the host doesn't have krbd.ko. And even the host has it, userspace ceph lib + qemu are always the best choice.
2) Security, if the block device was once mounted inside the vm, the host should not mount it. the code in the vm might hack into the guest kernel, and modify the metadata of the filesystem of the block device. The host kernel might be also broken into when mounting the block device.
3) Performance: to speed up the starting of the container, host avoids mounting the block device on the host.
Compromises: 1) access to
/etc/passwd
in the vm/sandbox, and string user name should be allowed in the agent API. 2) init the docker-init layer inside the vm/sandbox. 3) populate the content of the volume in vm/sandbox. (When a path which is not empty is assigned to be a volume, the content of the path should be copied to the volume when initializing the volume.)