Incorrect cgroup paths in /proc when running RKE on RancherOS

vboulineau commented 4 years ago

RancherOS Version: (ros os version) 1.5.5

Where are you running RancherOS? (docker-machine, AWS, GCE, baremetal, etc.) AWS (Official AMI)

Hello,

Setting up an RKE cluster through Rancher UI with a RancherOS node pool shows that cgroup paths in /proc/<pid>/cgroup are mostly incorrect for all containers created by K8S, referring paths that do not exist.

Issue does not occur on another node pool running Ubuntu.

For instance, checking PODs managed by Rancher itself:

ingress-nginx   nginx-ingress-controller-69t4v            1/1     Running     0          55m     172.29.128.90    vboulineau-rancher-worker1          <none>           <none>
ingress-nginx   nginx-ingress-controller-tw2xc            1/1     Running     0          30m     172.29.168.147   vboulineau-rancher-worker-ubuntu1   <none>           <none>

On node vboulineau-rancher-worker1, we can see:

4628 www-data nginx: master process /usr/local/openresty/nginx/sbin/nginx -c /etc/nginx/nginx.conf

Checking cgroup paths for this process, we'll get:

[rancher@vboulineau-rancher-worker1 ~]$ cat /proc/4628/cgroup
11:name=systemd:/kubepods/besteffort/pod7e241a28-b013-481e-9e00-cbe6f00b5eaf/6b5583d73d20fa6da209e4d628a0c27395815f6613cb0a2c7e0e5b64507b210c
10:cpuset:/docker/8ec3e6062e1825067fddaa64bfb839cb47579b4bfe49c4dd9d486ff81c35a479/kubepods/besteffort/pod7e241a28-b013-481e-9e00-cbe6f00b5eaf/6b5583d73d20fa6da209e4d628a0c27395815f6613cb0a2c7e0e5b64507b210c
9:net_cls,net_prio:/docker/8ec3e6062e1825067fddaa64bfb839cb47579b4bfe49c4dd9d486ff81c35a479/kubepods/besteffort/pod7e241a28-b013-481e-9e00-cbe6f00b5eaf/6b5583d73d20fa6da209e4d628a0c27395815f6613cb0a2c7e0e5b64507b210c
8:memory:/docker/8ec3e6062e1825067fddaa64bfb839cb47579b4bfe49c4dd9d486ff81c35a479/kubepods/besteffort/pod7e241a28-b013-481e-9e00-cbe6f00b5eaf/6b5583d73d20fa6da209e4d628a0c27395815f6613cb0a2c7e0e5b64507b210c
7:blkio:/docker/8ec3e6062e1825067fddaa64bfb839cb47579b4bfe49c4dd9d486ff81c35a479/kubepods/besteffort/pod7e241a28-b013-481e-9e00-cbe6f00b5eaf/6b5583d73d20fa6da209e4d628a0c27395815f6613cb0a2c7e0e5b64507b210c
6:cpu,cpuacct:/docker/8ec3e6062e1825067fddaa64bfb839cb47579b4bfe49c4dd9d486ff81c35a479/kubepods/besteffort/pod7e241a28-b013-481e-9e00-cbe6f00b5eaf/6b5583d73d20fa6da209e4d628a0c27395815f6613cb0a2c7e0e5b64507b210c
5:devices:/docker/8ec3e6062e1825067fddaa64bfb839cb47579b4bfe49c4dd9d486ff81c35a479/kubepods/besteffort/pod7e241a28-b013-481e-9e00-cbe6f00b5eaf/6b5583d73d20fa6da209e4d628a0c27395815f6613cb0a2c7e0e5b64507b210c
4:pids:/docker/8ec3e6062e1825067fddaa64bfb839cb47579b4bfe49c4dd9d486ff81c35a479/kubepods/besteffort/pod7e241a28-b013-481e-9e00-cbe6f00b5eaf/6b5583d73d20fa6da209e4d628a0c27395815f6613cb0a2c7e0e5b64507b210c
3:hugetlb:/docker/8ec3e6062e1825067fddaa64bfb839cb47579b4bfe49c4dd9d486ff81c35a479/kubepods/besteffort/pod7e241a28-b013-481e-9e00-cbe6f00b5eaf/6b5583d73d20fa6da209e4d628a0c27395815f6613cb0a2c7e0e5b64507b210c
2:perf_event:/docker/8ec3e6062e1825067fddaa64bfb839cb47579b4bfe49c4dd9d486ff81c35a479/kubepods/besteffort/pod7e241a28-b013-481e-9e00-cbe6f00b5eaf/6b5583d73d20fa6da209e4d628a0c27395815f6613cb0a2c7e0e5b64507b210c
1:freezer:/docker/8ec3e6062e1825067fddaa64bfb839cb47579b4bfe49c4dd9d486ff81c35a479/kubepods/besteffort/pod7e241a28-b013-481e-9e00-cbe6f00b5eaf/6b5583d73d20fa6da209e4d628a0c27395815f6613cb0a2c7e0e5b64507b210c
0::/

Only the 11:name=systemd one has the correct path. All others are prefixed by /docker/8ec3e6062e1825067fddaa64bfb839cb47579b4bfe49c4dd9d486ff81c35a479

This path /sys/fs/cgroup/<group>/docker/8ec3e6062e1825067fddaa64bfb839cb47579b4bfe49c4dd9d486ff81c35a479 does not exist.

However, this id does not come from nowhere, it's the container id of the console container in the system Docker daemon:

[rancher@vboulineau-rancher-worker1 ~]$ sudo docker -H unix:///var/run/system-docker.sock ps
CONTAINER ID        IMAGE                              COMMAND                  CREATED             STATUS              PORTS               NAMES
8ec3e6062e18        rancher/os-console:v1.5.5          "/usr/bin/ros entryp…"   About an hour ago   Up About an hour                        console

So it has probably something to do with the specific way RancherOS runs containers.

I'm not sure it's a bug, but it's definitely causing issues as several monitoring solutions rely on raw cgroup metrics to provide reliable statistics about container workloads.

If you don't believe it's a bug, maybe you'd be able to help understand the underlying mechanisms that explain this behaviour and a way for us to have a reliable way to get cgroups from process.

For reference, on Ubuntu, we get:

ubuntu@vboulineau-rancher-worker-ubuntu1:~$ cat /proc/10106/cgroup
12:hugetlb:/kubepods/besteffort/podadd72969-60f6-4776-ae6b-ddd7a7cbb72e/83ec66cb85cb7bf1fbee10c375b2e0adacc814ddbed74d2096873be39a000da4
11:blkio:/kubepods/besteffort/podadd72969-60f6-4776-ae6b-ddd7a7cbb72e/83ec66cb85cb7bf1fbee10c375b2e0adacc814ddbed74d2096873be39a000da4
10:net_cls,net_prio:/kubepods/besteffort/podadd72969-60f6-4776-ae6b-ddd7a7cbb72e/83ec66cb85cb7bf1fbee10c375b2e0adacc814ddbed74d2096873be39a000da4
9:perf_event:/kubepods/besteffort/podadd72969-60f6-4776-ae6b-ddd7a7cbb72e/83ec66cb85cb7bf1fbee10c375b2e0adacc814ddbed74d2096873be39a000da4
8:freezer:/kubepods/besteffort/podadd72969-60f6-4776-ae6b-ddd7a7cbb72e/83ec66cb85cb7bf1fbee10c375b2e0adacc814ddbed74d2096873be39a000da4
7:rdma:/
6:cpuset:/kubepods/besteffort/podadd72969-60f6-4776-ae6b-ddd7a7cbb72e/83ec66cb85cb7bf1fbee10c375b2e0adacc814ddbed74d2096873be39a000da4
5:pids:/kubepods/besteffort/podadd72969-60f6-4776-ae6b-ddd7a7cbb72e/83ec66cb85cb7bf1fbee10c375b2e0adacc814ddbed74d2096873be39a000da4
4:cpu,cpuacct:/kubepods/besteffort/podadd72969-60f6-4776-ae6b-ddd7a7cbb72e/83ec66cb85cb7bf1fbee10c375b2e0adacc814ddbed74d2096873be39a000da4
3:devices:/kubepods/besteffort/podadd72969-60f6-4776-ae6b-ddd7a7cbb72e/83ec66cb85cb7bf1fbee10c375b2e0adacc814ddbed74d2096873be39a000da4
2:memory:/kubepods/besteffort/podadd72969-60f6-4776-ae6b-ddd7a7cbb72e/83ec66cb85cb7bf1fbee10c375b2e0adacc814ddbed74d2096873be39a000da4
1:name=systemd:/kubepods/besteffort/podadd72969-60f6-4776-ae6b-ddd7a7cbb72e/83ec66cb85cb7bf1fbee10c375b2e0adacc814ddbed74d2096873be39a000da4
0::/system.slice/containerd.service

heimdull commented 4 years ago

we are also running into this issue and would like to know if this is us or Rancher...

deniseschannon commented 4 years ago

@vboulineau @heimdull As you mentioned, this issue is an issue due for monitoring solutions like Datadog. I believe the underlying Datadog chart requires this to be set in that particular way.

I'm not sure how much effort it'd be to update it in order to get a Datadog working.

dweomer commented 4 years ago

I think you will need to add /sys/fs/cgroup to the system-volumes service that both the docker and console system services effectively --volumes-from if I am not mistaken.

heimdull commented 4 years ago

@dweomer I added this to my cloud init file: rancher: services: system-volumes: volumes:

/sys/fs/cgroup:/sys/fs/cgroup -snip-

and after restarting the nodes it looks like Datadog now has access to what it needs !! Thanks

yatanasov commented 4 years ago

Hi @heimdull ,

This does not fix the issue for us, could you confirm that you were missing the container metrics from the containers live view page in DataDog ?

We tried doing as you suggested in many ways including:

sudo ros config set rancher.services.system-volumes.volumes [/sys/fs/cgroup:/sys/fs/cgroup] But we are still missing the metrics and we can see a lot of the following errors:

2020-03-22 12:40:38 UTC | PROCESS | DEBUG | (pkg/util/containers/metrics/cgroup_metrics.go:34 in Mem) | Missing cgroup file: /host/sys/fs/cgroup/memory/docker/d1647bd70a38c6e1dd8975cde6410cad6c96f2be5c60346aad9d4c55f2291e5e/kubepods/besteffort/podf6f278fb-bba2-491d-b61e-653179149451/70759db11e426a0d4323e1229754312f80fa5a5a99b0f0806ad83f20d340f53f/memory.stat 2020-03-22 12:40:38 UTC | PROCESS | DEBUG | (pkg/util/containers/metrics/cgroup_metrics.go:188 in CPU) | Missing cgroup file: /host/sys/fs/cgroup/cpu,cpuacct/docker/d1647bd70a38c6e1dd8975cde6410cad6c96f2be5c60346aad9d4c55f2291e5e/kubepods/besteffort/podf6f278fb-bba2-491d-b61e-653179149451/70759db11e426a0d4323e1229754312f80fa5a5a99b0f0806ad83f20d340f53f/cpuacct.stat

heimdull commented 4 years ago

This did resolve the issue for us. We did not see metrics then I added the cgroup to system-volumes and now we have cpu/memory metrics in datadog. I added the setting through my cloud init file and rebuilt all the nodes with the new setting.

vboulineau commented 4 years ago

I successfully validated the workaround, note that you cannot just overwrite the rancher.services.system-volumes.volumes key as other volumes are in there too.

I did not find any way to have an auto-merge, so you'd basically need to take default + /sys/fs/cgroup With latest version this is what the file looks:

rancher:
  services:
    system-volumes:
      volumes:
      - /dev:/host/dev
      - /etc/docker:/etc/docker
      - /etc/hosts:/etc/hosts
      - /etc/logrotate.d:/etc/logrotate.d
      - /etc/resolv.conf:/etc/resolv.conf
      - /etc/ssl/certs/ca-certificates.crt:/etc/ssl/certs/ca-certificates.crt.rancher
      - /etc/selinux:/etc/selinux
      - /lib/firmware:/lib/firmware
      - /lib/modules:/lib/modules
      - /run:/run
      - /usr/share/ros:/usr/share/ros
      - /var/lib/boot2docker:/var/lib/boot2docker
      - /var/lib/rancher/cache:/var/lib/rancher/cache
      - /var/lib/rancher/conf:/var/lib/rancher/conf
      - /var/lib/rancher:/var/lib/rancher
      - /var/lib/waagent:/var/lib/waagent
      - /var/log:/var/log
      - /var/run:/var/run
      - /sys/fs/cgroup:/sys/fs/cgroup

@dweomer What about adding this mount in the default configuration?

yatanasov commented 4 years ago

Hi @vboulineau,

Thanks for your input. Could you clarify the following:

When you say "auto-merge" - do you refer to updating the exiting ros config while the VM is running using ros config export my-config and then manually update the config and merge it using ros config merge -i my-config followed by a rebooting of the VM ?

Secondly, did you validate the workaround by creating a new VM with the cloud-init file you pasted or you rebuild/reconfigured an existing VM ?

Thank you very much, this would be extremely helpful for us!!

vboulineau commented 4 years ago

I did not export but I applied with ros config merge -i my-config, with my-config being the file I pasted in previous reply. And rebooted the VM.

rancher / os

Incorrect cgroup paths in /proc when running RKE on RancherOS #2967