kata-containers / runtime

Kata Containers version 1.x runtime (for version 2.x see https://github.com/kata-containers/kata-containers).
https://katacontainers.io/
Apache License 2.0
2.1k stars 375 forks source link

EmptyDir in k8s handled incorrectly #1341

Closed mcastelino closed 5 years ago

mcastelino commented 5 years ago

EmptyDir in k8s handled incorrectly

See https://github.com/kata-containers/runtime/issues/61#issuecomment-440645443

EmptyDir defined in k8s supposed having three types of medium, default, tmpfs, hugepage. And the default medium should be node disk instead of memdisk, should it be better we consider mount option before pass ephemeral volumes into guest and back to use 9pfs when it was backed by default medium

https://kubernetes.io/docs/concepts/storage/volumes/#emptydir

By default, emptyDir volumes are stored on whatever medium is backing the node - that might be disk or SSD or network storage, depending on your environment. However, you can set the emptyDir.medium field to "Memory" to tell Kubernetes to mount a tmpfs (RAM-backed filesystem) for you instead. While tmpfs is very fast, be aware that unlike disks, tmpfs is cleared on node reboot and any files you write will count against your Container’s memory limit.

https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/#enforcing-node-allocatable

@harche @amshinde @linxiulei

Expected result

When medium is not specified or default we should use 9p/virtio-fs

Actual result

We use tmpfs within the VM resulting in incorrect behaviour, scheduling and memory accounting'

mcastelino commented 5 years ago

/cc @egernst

amshinde commented 5 years ago

I passed in two empty directories one with medium memory and other as default. In the config.json I see, that the two directories appear as:

                {
                        "destination": "/tmp/xchange",
                        "type": "bind",
                        "source": "/var/lib/kubelet/pods/d391df17-4698-11e9-b7d7-525400472345/volumes/kubernetes.io~empty-dir/xchange-kata",
                        "options": [
                                "rw",
                                "rbind",
                                "rprivate",
                                "bind"
                        ]
                },
                {
                        "destination": "/tmp/tmpemp",
                        "type": "bind",
                        "source": "/var/lib/kubelet/pods/d391df17-4698-11e9-b7d7-525400472345/volumes/kubernetes.io~empty-dir/tmpempty-kata",
                        "options": [
                                "rw",
                                "rbind",
                                "rprivate",
                                "bind"
                        ]
                }

There is no information about the medium that is passed to the OCI layer. The only way to handle this correctly would be to actually check if the directory is mounted as a tmpfs mount or not.

$ mount | grep empty
tmpfs on /var/lib/kubelet/pods/d391df17-4698-11e9-b7d7-525400472345/volumes/kubernetes.io~empty-dir/tmpempty-kata type tmpfs (rw,relatime)
linxiulei commented 5 years ago

I guess we could do the similar logic as the block rootfs

amshinde commented 5 years ago

@linxiulei What do you mean block rootfs. Since for default medium, the empt-dir volume is just a host directory, this can only be passed through 9p. I have raised #1374 to handle this,ptal.

linxiulei commented 5 years ago

I just agreed with you checking medium by mount info 😄