Open jjyyxx opened 2 months ago
What containerization solution are you using?
The containers are allocated with web ui of certain proprietary cluster management system, but the underlying containers are most likely managed by docker with NVIDIA Container Toolkit. Cgroup v1 is used, and I could check /sys/fs/cgroup/cpuset/cpuset.cpus
(showing 208-223
) and /sys/fs/cgroup/memory/memory.limit_in_bytes
(showing 34359738368
).
The cgroup stuff seems to be quite messy still cf. https://github.com/kubernetes/kubernetes/issues/119669 .
What does cat /proc/self/cgroup
say in your case?
It shows
12:devices:/system.slice/docker-c18d38b354eab53da0111aa259a7b247b29f261eb6cfef946f7653ba18453271.scope
11:memory:/system.slice/docker-c18d38b354eab53da0111aa259a7b247b29f261eb6cfef946f7653ba18453271.scope
10:rdma:/system.slice/docker-c18d38b354eab53da0111aa259a7b247b29f261eb6cfef946f7653ba18453271.scope
9:pids:/system.slice/docker-c18d38b354eab53da0111aa259a7b247b29f261eb6cfef946f7653ba18453271.scope
8:hugetlb:/system.slice/docker-c18d38b354eab53da0111aa259a7b247b29f261eb6cfef946f7653ba18453271.scope
7:perf_event:/system.slice/docker-c18d38b354eab53da0111aa259a7b247b29f261eb6cfef946f7653ba18453271.scope
6:cpuset:/system.slice/docker-c18d38b354eab53da0111aa259a7b247b29f261eb6cfef946f7653ba18453271.scope
5:blkio:/system.slice/docker-c18d38b354eab53da0111aa259a7b247b29f261eb6cfef946f7653ba18453271.scope
4:freezer:/system.slice/docker-c18d38b354eab53da0111aa259a7b247b29f261eb6cfef946f7653ba18453271.scope
3:net_cls,net_prio:/system.slice/docker-c18d38b354eab53da0111aa259a7b247b29f261eb6cfef946f7653ba18453271.scope
2:cpu,cpuacct:/system.slice/docker-c18d38b354eab53da0111aa259a7b247b29f261eb6cfef946f7653ba18453271.scope
1:name=systemd:/system.slice/docker-c18d38b354eab53da0111aa259a7b247b29f261eb6cfef946f7653ba18453271.scope
0::/system.slice/docker-c18d38b354eab53da0111aa259a7b247b29f261eb6cfef946f7653ba18453271.scope
Or, if universal support is difficult at the moment, is it practical to add two customizable meters:
Inside containers hosted on a server with a lot of cores and RAM, with the actual core and RAM limit constrained via cgroup settings (cpuset and memory), the htop's CPU and RAM meters are quite unhelpful, in the sense that it could not reflect the container's actual situation (and sometimes even annoying when the terminal size is small, leaving no space for processes). In my case, the server has 2x Intel(R) Xeon(R) Platinum 8480+ (224 threads) and 2TiB memory, but the container is limited to 16 threads and 32GiB memory.
This two constraints can be queried via
/sys/fs/cgroup/cpuset/cpuset.cpus
and/sys/fs/cgroup/memory/memory.limit_in_bytes
for cgroup v1/sys/fs/cgroup/cpuset.cpus
and/sys/fs/cgroup/memory.max
for cgroup v2Ideally, htop could provide container-aware option such that only the core utilization within the cpuset, and the actual memory limit (and usage) is displayed.