google / gvisor

Application Kernel for Containers
https://gvisor.dev
Apache License 2.0
15.64k stars 1.29k forks source link

Multiple containers in one pod bind the same dir in host may result in disk space leak #2416

Open zhangningdlut opened 4 years ago

zhangningdlut commented 4 years ago

In our environment, there are more than one containers in one pod, and the different containers may share the the same dir in host with "type bind". The config may like this: {"destination":"/tmp/logs","type":"bind","source":"/tmp/test","options":["rbind","rprivate"]}

Because of the dirent cache in sentry, if two containers operator the same file in the shared dir, it may result in two problems:

  1. Container A opens file X and close it, then sentry will hold the dirent cache of file X, and sandbox process and the gofer process of container A will hold the fd of file X. Then if container B deletes file X, file X will be "deleted" state, and the disk space will not be released util the cache of file X be cleaned up.

  2. Container A opens file X and close it. Contianer B renames file X to file Y. Container A deletes file Y. file Y will be "deleted" state, and the disk space will not be released util the cache of file Y be cleaned up.

I have no way to solve the problem, unless close the dirent cache.

runsc --version
runsc version release-20200323.0-119-g78126611e61e
spec: 1.0.1-dev
zhangningdlut commented 4 years ago

I found mountSharedSubmount may work, let me test it.

zhangningdlut commented 4 years ago

I found the comments in processHints: // TODO(b/142076984): Only support tmpfs for now. Bind mounts require a // common gofer to mount all shared volumes.

If there are plans to do it? @fvoznika

fvoznika commented 4 years ago

Shared mounts between containers in the same sandbox is challenging. For one, OCI gives very little information about how mounts are composed. And it comes piece meal, one container at a time. Mount hint annotations provides more information about mounts so that sharing can be handled better. But there are a few features missing to make it work in gVisor:

One thing that you could evaluate for now is to evict cache entries after some time has elapsed. This is a good improvement overall to release back resources in idle sandboxes. This guarantees that after some time, the deleted file will be dropped from the cached and the space freed.

zhangningdlut commented 4 years ago

Hi, @fvoznika Thanks for the reply! And when will the VFS2 be ready ?

fvoznika commented 4 years ago

We're working on it. runsc will be getting basic support in the next week. We're hoping to get most workloads running on it in the next 4 weeks.