Closed harche closed 6 years ago
The kata agent protocol already supports it. Please see https://github.com/kata-containers/agent/blob/master/protocols/grpc/agent.proto#L194
We can implement a new type of storage driver (e.g. tmpfs
) that instructs the kata agent to setup a tmpfs mount point at pb.Storage.Mountpoint
and reference it via the container oci spec.
These need to be implemented in both the kata agent and runtime though.
@bergwolf Sounds good. I will try to submit PR supporting ephemeral volumes.
@harche - checking to see if you've made any progress here.
@egernst Sorry I was away for medical reasons. Just back today. I will start working on it.
Docker and Kubernetes take different approaches to attach emphemeral volumes (backed by tmpfs) to the container.
When an empheral volume is attached using kubernetes (by setting empty.medium to "Memory"
in the yaml as described here), the corresponding docker container's config.v2.json looks this,
"MountPoints": {
"/cache": {
"Source": "/var/lib/kubelet/pods/366c3a75-4869-11e8-b479-507b9ddd5ce4/volumes/kubernetes.io~empty-dir/cache-volume",
"Destination": "/cache",
"RW": true,
"Name": "",
"Driver": "",
"Type": "bind",
"Propagation": "rprivate",
"Spec": {
"Type": "bind",
"Source": "/var/lib/kubelet/pods/366c3a75-4869-11e8-b479-507b9ddd5ce4/volumes/kubernetes.io~empty-dir/cache-volume",
"Target": "/cache"
}
},
Just to make sure the backing volume is indeed backed by tmpfs,
# mount |grep 366c3a75
tmpfs on /var/lib/kubelet/pods/366c3a75-4869-11e8-b479-507b9ddd5ce4/volumes/kubernetes.io~empty-dir/cache-volume type tmpfs (rw,relatime)
tmpfs on /var/lib/kubelet/pods/366c3a75-4869-11e8-b479-507b9ddd5ce4/volumes/kubernetes.io~secret/default-token-2w6d4 type tmpfs (rw,relatime)
However when you use docker directly to attach tmpfs by something like,
docker run -it --mount type=tmpfs,destination=/app,tmpfs-mode=1770 busybox sh
The corresponding config.v2.json for that container looks like,
"MountPoints": {
"/app": {
"Source": "",
"Destination": "/app",
"RW": true,
"Name": "",
"Driver": "",
"Type": "tmpfs",
"Spec": {
"Type": "tmpfs",
"Target": "/app",
"TmpfsOptions": {
"Mode": 1016
}
}
}
}
As you can see when it comes to handling tmpfs based volumes using docker is pretty simple, but kubernetes doesn't let the container config know that the volume is of type tmpfs
. Instead, it just presents it as a regular bind
mount.
So from a runtime's point of view how do we come up with the solution that works well with kubernetes? Kubernetes doesn't put anything specific to tmpfs
in config of the container.
One of the solution could be to parse the Mounts
of the spec and filter by kubernetes.io~empty-dir
. We can treat those volumes where source has that string differently and instruct agent to just create that directory inside the VM's memory instead of passing it as 9pfs. But this solution would be too specific to kubernetes.
What do you guys think?
@bergwolf @egernst @gnawux @jbryce
Thanks for the detailed info @harche ! - /cc @amshinde
Yeah, we will need to skip these mounts similar to what we do for "/dev/shm", which docker chooses to pass as bind
mount as well instead of tmpfs
.
@harche You can take a look here:
https://github.com/kata-containers/runtime/blob/master/virtcontainers/container.go#L300
My bad if I missed something. EmptyDir defined in k8s supposed having three types of medium, default, tmpfs, hugepage. And the default medium should be node disk instead of memdisk, should it be better we consider mount option before pass ephemeral volumes into guest and back to use 9pfs when it was backed by default medium
@harche @amshinde
@linxiulei your observation is correct. The medium should be node disk. If not it will be incorrectly accounted for. If not we will end up eating RAM when runc would not have. This will also cause issues with the scheduler. I will open a bug.
/cc @harche @amshinde
I passed in two empty directories one with medium memory
and other as default
.
In the config.json I see, that the two directories appear as:
{
"destination": "/tmp/xchange",
"type": "bind",
"source": "/var/lib/kubelet/pods/d391df17-4698-11e9-b7d7-525400472345/volumes/kubernetes.io~empty-dir/xchange-kata",
"options": [
"rw",
"rbind",
"rprivate",
"bind"
]
},
{
"destination": "/tmp/tmpemp",
"type": "bind",
"source": "/var/lib/kubelet/pods/d391df17-4698-11e9-b7d7-525400472345/volumes/kubernetes.io~empty-dir/tmpempty-kata",
"options": [
"rw",
"rbind",
"rprivate",
"bind"
]
}
There is no information about the medium that is passed to the OCI layer. The only way to handle this correctly would be to actually check if the directory is mounted as a tmpfs mount or not.
$ mount | grep empty
tmpfs on /var/lib/kubelet/pods/d391df17-4698-11e9-b7d7-525400472345/volumes/kubernetes.io~empty-dir/tmpempty-kata type tmpfs (rw,relatime)
Hi,
As of now all volumes are created on the host and passed to VM via 9pfs. But k8s allows you to create ephemeral volumes Also, these volumes can be backed by a ramdisk. Ephemeral volumes, as the name indicates, live and die with the pod. There is no reason to use 9pfs for this type of volume.
Kata needs to support these volumes by creating tempfs based volume inside of the VM.
The possible approach that I can think of,
Any thoughts?