Closed courtland closed 7 months ago
hey @courtland , thank you for reaching out. Just to verify: does podman (without nomad...) show the stats?
podman stats
Hi @courtland 👋
Could you also check if you're running cgroups v2 and if switching to v1 fixes the problem? This may be related to https://github.com/hashicorp/nomad-driver-podman/issues/160.
Yes, podman stats
correctly shows CPU usage.
I am indeed running cgroups v2 (ubuntu jammy/22.04), so perhaps that is likely the culprit. I will work on testing an instance with v1.
Switching to cgroups v1 does NOT fix the problem. It's also worth noting that client CPU utilization reports correctly on my x86_64/amd64 instances (everything else is the same except arch).
Thanks for testing, it seems like we will need to investigate this further 👍
Hello there, I was interested in helping investigate this issue since I am also using Nomad to orchestrate container workloads on ARM 64 bit based nodes: I am not able to reproduce the absence of CPU statistics, so maybe I have done something different in my setup with respect to @courtland ... Maybe it's just the version of Podman or the permission on the cgroup slice folder; i'll attach below some pictures taken from Nomad's UI, about an aarch64 node I am using (it's a Raspberry Pi 3B)
Thank you for the extra info @Procsiab.
CPU fingerprinting and stat is something we've been fixing in Nomad (specially on ARM) for the past few releases, so we may have fixed this at some point. Which version of Nomad are you using?
@courtland by any chance would you be able to check if this is still a problem?
Thanks!
In reply to @lgfa29 and to integrate my previous post: at the time of writing it I was using Nomad 1.6.2. I am now using Podman 4.7.2 and Nomad 1.6.3 on the same ARM hardware and I am still not experiencing the issue we are discussing here.
Thank you for the extra info @Procsiab.
CPU fingerprinting and stat is something we've been fixing in Nomad (specially on ARM) for the past few releases, so we may have fixed this at some point. Which version of Nomad are you using?
@courtland by any chance would you be able to check if this is still a problem?
Thanks!
Yes, Nomad 1.6.x seems to have resolved this problem. Closing this issue. Thanks for following up and looking into it @Procsiab
Under AWS Graviton aarch64/arm64 instances, the CPU utilization reported by the nomad client is 0.
My understanding is this is a function of the driver (podman in my case) and not the nomad client. I briefly tried to track down where this could be breaking, but the
//FIXME implement cpu stats correctly
inrunStatsEmitter
made me wonder if somehow that's related?podman stats
shows container usage correctly.Ubuntu 22.04.2 LTS // podman version 3.4.4
This is somewhat related to https://github.com/hashicorp/nomad/issues/4233