Closed kevinschoonover closed 10 months ago
Same issue on KVM VM + Docker, Nomad 1.7.1
as workaround you can override it.. my 2*3300MHz (eg from cat /proc/cpuinfo) = 6600Mhz
client { cpu_total_compute=6600 .. }
restart nomad
Potentially related: https://github.com/hashicorp/nomad/issues/19412
Somewhat related - after upgrading my raspberry pi cluster to use Nomad 1.7.1 I was seeing errors from the CPU fingerprinter.
Dec 10 05:25:23 rasp-pi-2 nomad[1916183]: 2023-12-10T05:25:23.004Z [ERROR] client.alloc_runner: postrun failed: alloc_id=965cd7d7-f029-36d2-1a83-9e1e3db848f9 error="hook \"cpuparts_hook\" failed: open /sys/fs/cgroup/cpuset/nomad/share/cpuset.cpus: no such file or directory"
Dec 10 05:25:23 rasp-pi-2 nomad[1916183]: 2023-12-10T05:25:23.006Z [ERROR] client.alloc_runner: postrun failed: alloc_id=26a42c2f-d788-11e5-9ecb-d8aead7ca081 error="hook \"cpuparts_hook\" failed: open /sys/fs/cgroup/cpuset/nomad/share/cpuset.cpus: no such file or directory"
I was able to fix this by creating the directory Nomad is looking for and it resolved the issue.
It also happened in the pre_run hook as well:
Dec 10 05:26:25 rasp-pi-2 nomad[1916183]: 2023-12-10T05:26:25.228Z [ERROR] client.alloc_runner: prerun failed: alloc_id=e3c914a4-855e-0887-8085-c659ed9cd122 error="pre-run hook \"cpuparts_hook\" failed: open /sys/fs/cgroup/cpuset/nomad/share/cpuset.cpus: no such file or directory"
@lindleydev that's actually a separate problem - I suspect in your case cgroups is mounted but the cpuset controller is not enabled. In previous versions of Nomad we allowed such a configuration at the expense of not actually enforcing resource utilization, but in 1.7 it's mandatory. There's some discussion about this happening in https://github.com/hashicorp/nomad/pull/19176
Possible reason of this issue described here: https://github.com/hashicorp/nomad/issues/19412#issuecomment-1850509695
Hey folks, just an update that the team is actively working on this issue. This issue and https://github.com/hashicorp/nomad/issues/19412 are effectively duplicates, so I'm going to close this issue as a dupe because there's been a bit more discussion over there.
Nomad version
Operating system and Environment details
ovh VPS with the following configuration:
Issue
After upgrading to 1.7.1, the OVH nodes in my nomad cluster report 0 MHZ fingerprinted CPU; however, if you look at the logs below you see that it detects 8 CPUs just not the clock speed for them.
I have another node in hetzner that it is able to properly detect the CPU frequency for. Downgrading to nomad 1.6.4 and restarting resolves then problem.
Reproduction steps
Start nomad client on a OVH node and have it join the cluster
Nomad Client logs (if appropriate)