Open Alexsandr-Random opened 7 months ago
Hi @Alexsandr-Random and thanks for raising this issue.
The memory implementation of cgroups v2 does not expose RSS and is therefore not supported by the Docker driver. The difference between the memory statistics available within the Docker driver can be seen here.
It seems there might an equivalent when interrogating the cgroup v2 memory files, however, I am unsure if it would be possible to plumb this through to the Docker driver. This is because it does not have any direct understanding of the isolation which is left to Docker itself. I'll keep this issue open for future readers and mark it that it requires further investigation.
@jrasell Thanks for quick response! So for now the only solution to get metrics like memory rss on latest distros is manually downgrade cgroups v2 --> v1?
So for now the only solution to get metrics like memory rss on latest distros is manually downgrade cgroups v2 --> v1?
I looked into what the Docker API is exposing to us. The API docs doesn't exactly match what their CLI does (ref https://github.com/moby/moby/issues/45727 https://github.com/moby/moby/issues/45739) unfortunately.
But if I take a look at the cgroup for the container I just queried, I can see that memory.current
maps directly to what Docker calls "usage".
$ curl -s --unix-socket /run/docker.sock "http://localhost/containers/14b9ea15.../stats?stream=false&oneshot=true" | jq .memory_stats.usage
1441792
$ sudo cat /sys/fs/cgroup/system.slice/docker-14b9ea15....scope/memory.current
1441792
My understanding from the kernel docs is that memory.current
is everything and there's just a much more fine-grained set of stats available. We could probably expose some of those stats and maybe look into whether we can get the exact combination of items that adds up to the coarse "RSS" stat folks are used to.
In the meantime, you can derive the rough equivalent of RSS by subtracting Cache
and Swap
from Usage
.
Yes, I can confirm what is written in https://github.com/hashicorp/nomad/issues/19604.
Nomad indeed displays accurate metrics (nomad_client_allocs_memory_usage) only when using cgroups v2 or higher.
If cgroups = v1, the correct memory percentage metrics would be the ratio of
nomad_client_allocs_memory_rss / nomad_client_allocs_memory_allocated.
Otherwise, you may observe behavior where memory is either immediately ~= 100% or it may appear to be a memory leak (on graphs), although that is not the case.
When using the expression below you might encounter inaccurate representations.
nomad_client_allocs_memory_usage / nomad_client_allocs_memory_allocated,
Nomad version
Any
Operating system and Environment details
NAME="Ubuntu" VERSION="22.04 LTS"
Issue
When we using cgroups v2 nomad agent stops sending some metrics for Prometheus. Most important for us is nomad.client.allocs.memory.rss When cgroups set to v1 - everything scraping correctly. I do researches on 3 different independent nomad clusters so i could say this looks like bug.
Reproduction steps
Start some docker job on latest versions on ubuntu when cgroups v2 enabled by default and enable telemetry stanza on nomad client to send all metrics to Prometheus. And try to check if nomad.client.allocs.memory.rss scraping correctly
Then start the same job with cgroups v1 and you will get again nomad.client.allocs.memory.rss
Expected Result
nomad.client.allocs.memory.rss - scraping from any ubuntu version and from any cgroups version.
Actual Result
nomad.client.allocs.memory.rss - scraping only from cgroups v1.