Closed dadux closed 3 years ago
Ideally we think there should be a new driver type and storage handler in kata-agent
that would create the empty directory in the VM to handle this.
See for the driver types https://github.com/kata-containers/agent/blob/master/device.go#L26-L33 and https://github.com/kata-containers/agent/blob/master/mount.go#L194-L201 for the storage handlers.
@dadux Empty dirs with default
medium were implemented this way as technically a directory on the host is being shared with the pod.
But if can safely assume the directory on the host is never really accessed on the host side, I think we could go with the approach of instead creating this inside the guest.
@awprice Yes, we would need a new storage driver+handler to instead create the directory inside the guest and bind mount this to the container's mount namespace.
The open will be what backs the emptyDir? A host side ephemeral "volume" or just guest RAM? If we choose to use guest ram then the medium is effectively ignored. If sort of correctly model this we will need to create a large
sparse file on the host to back the volume and pass it in via virtio-disk/scsi. That will give you performance without costing memory. And as such from a resource consumption point of view resemble runc. As the sparse file backing the volume will only grow in response to writes.
The only issue is deletes. If the files on the volumes are deleted that space may
not be recovered.
@mcastelino What we are proposing is instead of creating the ephemeral directory/volume on the host side filesystem and then mounting that into the VM using 9p, we create the ephemeral directory inside the guest VM, and bind mount that ephemeral directory into the containers within the VM on the rootfs. I guess the agent inside the VM would create the directories inside the VM when creating the containers.
The ephemeral directory will reside on whatever filesystem the rootfs is, which in most cases I believe will be 9p. In our case we are using devicemapper for our rootfs and so will benefit from the performance of having the ephemeral directory on the rootfs.
As all containers in a Kubernetes pod are created in the same guest VM, I doubt there is anything else that is likely to access the files in the emptyDir on the host side.
This would solve issues with cleanup too that you mentioned above, when the VM is terminated, the ephemeral directory inside the VM will be terminated too as the rootfs is cleaned up.
@awprice you mean the rootfs of the VM itself or that of the container? The container rootfs is backed by the 9p/device mapper. The VM rootfs today is a NVDIMM or initrd. So the VM rootfs is not backed by any host side writable storage.
Placing the volume on the container rootfs is effectively the same as the user not using implicit ephemeral volumes https://github.com/docker-library/docker/blob/65fab2cd767c10f22ee66afa919eda80dbdc8872/18.09/dind/Dockerfile#L40
Here the implicit ephemeral volumes will end up being a directory within the container filesystem.
@mcastelino Yep it doesn't sound like rootfs of the VM is feasible as it is NVDIMM/initrd as you said. The container's rootfs doesn't sound feasible either.
This isn't about handling docker volumes, this is about handling the specific case of Kubernetes EmptyDir where the medium != Memory.
We are thinking of storing the shared directory between the containers in the pod in the sandbox filesystem, i.e. /run/kata-containers/shared/containers/<sandbox id>
. This is stored on device mapper in our case otherwise it is in 9p.
I've come up with a solution for this issue, see the following PRs:
Would this allow you to use docker in kata backed with this type of emptydir?
Would this allow you to use docker in kata backed with this type of emptydir?
This is specifically for emptyDir in Kubernetes - https://kubernetes.io/docs/concepts/storage/volumes/#emptydir
Was trying to see if docker in kata would work better with /var/lib/docker backed by an emptydir.
Looks like, no. It still is backing the emptyDir with 9p for some reason, rather then just using emptydir storage inside the vm itself.
Please reopen this issue. As mentioned in https://github.com/kata-containers/runtime/pull/1485, this doesn't avoid 9p at all.
Re-opening on request.
@amshinde - could you tal?
@kfox1111 We create the empty-dir with default medium on the sandbox rootfs, this will help in case one is using devicemapper storage which is what the solution was targeted for. In case of other storage drivers, you still end up using 9p since the rootfs itself is passed using 9p.
Do you have a proposal for solving this for other storage drivers? We can discuss possible solutions. PRs are welcome as well :)
Another option to look at is using Virtio-fs. If you switch to using Virtio-fs, emptyDirs will use virtio-fs instead of 9p.
Might work as a workaround. I'll give that a try. Still wouldn't be as performant I think as having emptydirs be associated with the vm itself.
Why not make a qcow2 or raw file for the emptydir and map it into the vm?
@kfox1111 There is ongoing work to implement empty-dirs using qcow2 images. There should be a PR for this soon. cc @egernst
Awesome. Thanks for the heads up.
Kubernetes EmptyDir performances are very slow (9p), while there is no real need to use to use 9pfs for the default medium. The EmptyDir volumes are only intended to share data between containers within a pod, and not with the host.
This was a recent related change in https://github.com/kata-containers/runtime/issues/1341 where tmpfs is not handled correctly.
For the default medium type (disk),
kata-agent
should probably create a directory in the VM, and mount it to the containers ?cc @mcastelino @amshinde ?