google / gvisor

Application Kernel for Containers
https://gvisor.dev
Apache License 2.0
15.74k stars 1.29k forks source link

/proc/self/cgroup does not exist #1906

Open TrevorSundberg opened 4 years ago

TrevorSundberg commented 4 years ago

Repro:

docker run --rm alpine cat /proc/self/cgroup
12:devices:/docker/75f4e1241b0fc88a27b062a41adbebded0ef4661d9ec1be45c4dfad120a1cf5d
...
docker run --rm --runtime=runsc alpine cat /proc/self/cgroup
cat: can't open '/proc/self/cgroup': No such file or directory

runsc version

runsc version release-20200127.0-132-ge07eacc99f00
spec: 1.0.1-dev

docker version

Client: Docker Engine - Community
 Version:           19.03.3
 API version:       1.40
 Go version:        go1.12.10
 Git commit:        a872fc2f86
 Built:             Tue Oct  8 00:59:59 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.3
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.10
  Git commit:       a872fc2f86
  Built:            Tue Oct  8 00:58:31 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.6
  GitCommit:        894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc:
  Version:          1.0.0-rc8
  GitCommit:        425e105d5a03fabd737a126ad93d62a9eeede87f
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683
Linux tsundberg-dev 5.3.0-28-generic #30~18.04.1-Ubuntu SMP Fri Jan 17 06:14:09 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
TrevorSundberg commented 4 years ago

This may also be related, but /sys/fs/cgroup/memory/memory.stat does not exist. Should I report this as a separate issue?

ianlewis commented 4 years ago

cgroups and cgroupfs is currently not supported inside the sandbox. What is your use case for using cgroups?

TrevorSundberg commented 4 years ago
  1. We use the container id for logging and telemetry (within the container). We could generate another unique id but using the container id is helpful to associate with other logs like the docker logs.
  2. We use /sys/fs/cgroup/memory/memory.stat to detect page fault rate and kill our own process if it exceeds a limit (akin to the OOM killer).
  3. We also use /sys/fs/cgroup/memory/memory.stat to implement Chromium's memory pressure monitor.

If you have any suggestions for ways of doing this within gVisor I'd be happy to change our implementation.

ianlewis commented 4 years ago

@TrevorSundberg I think we are open to supporting cgroups inside the sandbox at some point but maybe there are ways you could mitigate not having them for your use case.

I don't have a great solution for telemetry but you could use info from the downward api to generate a unique id that was easier to follow. Unfortunately the downward API gives you the Pod ID but doesn't give you the container name or ID.

For page faults, could you check the page fault rate from outside the sandbox? i.e. have an agent that runs in a DaemonSet and checks cgroups for Pod containers running on the node?

Choogster1 commented 4 years ago

cgroups are required by another Google app - cadvisor. Would be good if one Google app supports the other! Or is there a workaround? This is stopping the entire metrics system using Prometheus working currently, as it depends on cadvisor to grab data from the containers.

ianlewis commented 4 years ago

@Choogster1 cadvisor needs to run as a privileged pod in order to read stats from other containers besides itself. That's not likely to ever be possible with runsc since it breaks the sandbox contract. You'll likely want to run cadvisor as a normal priviliged container.

You could also try and deploy it with the root filesystem mounted as read only (though I'm not sure it will work to read the host cgroups when mounted this way). Anyway, I'm a bit skeptical that runsc is adding much value in this case.