Closed schmichael closed 1 year ago
Is anyone working on this?
Note that you can work around this issue by using the cpu_total_compute configuration for compute elements that miscalculate. This will override the fingerprinter in cases where it can't calculate properly.
Has this made it into 1.0.0 GA ? I assume not, as I am hitting this on an aarch64
VM of CentOS 7.
(setting the cpu_total_compute
value works, but 😞 )
Hi @shantanugadgil. No this didn't land in 1.0
Another problem is that for arm cpu usage metrics simply does not work. Container cpu usage remains 0 MHz all the time, making setting cpu_total_compute
useless as scheduler always gets zero.
FWIW I'm also running into this issue on AWS Graviton 3 Nitro instances (aarch64), so I'm also falling back to manually setting cpu_total_compute
based on dmidecode. But as @roylez mentioned, CPU usage on the client always reports 0. Kind of a bummer after putting a lot of effort into making our workloads arm friendly :(
I'm thinking about taking a shot at a PR for this unless anyone else is already working on this? Any suggestions on most reliable source for current CPU freq on arm?
Nomad v1.4.3
Hi @courtland! We'd love to review a PR for this. Our code is in helper/stats/cpu.go
but the real work to be done here may actually be in github.com/shirou/gopsutil/v3/cpu
, which we use as the library to read CPU info. As @schmichael noted above, dmidecode
seems to be the reasonable fallback.
@tgross thanks for the hints!
Is there a preference to adding a new golang package for dmidecode parsing vs. exec within the client and doing it "manually"? Is there some other area where the nomad client optionally relies on a userland binary to exist? I'm assuming we would have to recommend in the docs that anyone running aarch64 should install dmidecode separately.
Is there a preference to adding a new golang package for dmidecode parsing vs. exec within the client and doing it "manually"?
I'd order our preference for a solution as follows:
/dev/mem
to shirou/gopsutil
(but I also recognize that's a heck of a lift :grinning: )dmidecode
binary to shirou/gopsutil
dmidecode
binary to Nomad Is there some other area where the nomad client optionally relies on a userland binary to exist? I'm assuming we would have to recommend in the docs that anyone running aarch64 should install dmidecode separately.
A couple bits of the Nomad client have (undocumented :grimacing:) dependencies on coreutils binaries on Unixish hosts, but otherwise I think only CNI requires one. So long as we document the requirement and have a safe fallback if it's not installed, I think we'll be ok.
+1 to Tim's list (although I think he meant /dev/cpuinfo
and not /dev/mem
... parsing /dev/mem
would be very exciting), but just to throw out another option that might compose well with other fallbacks:
I think a lookup table might be a reasonable approach as well as that avoids the problem of having to find the max frequency supported and not whatever frequency the chip is currently at as part of power/thermal management. The big downside of lookup tables is that they're impossible to test without access to that hardware, so we'd have to rely on contributions.
For example we have a big AWS EC2 lookup table here: https://github.com/hashicorp/nomad/blob/main/client/fingerprint/env_aws_cpu.go
Generated by make ec2info
and backported.
Hah, yeah, gotta be in /dev/mem
somewhere, right, maybe... I'm assuming you both meant /proc/cpuinfo
? Unfortunately, at least on my m7g, it just has BogoMIPS : 2100.00
with some other cpu features. The actual max speed is 2600MHz according to dmidecode.
The lookup table approach is alright to get the max freq, especially since you're already doing that. Actually, in my case, it's simply just missing the latest M7g
instance types. I think you're saying that requires someone to go and run make ec2info
manually and merge the result?
I am successfully using dmidecode
to know the max frequency. Personally I'd prefer if nomad worked on any arm64 system. The python and go psutil libraries both fail at detecting CPU speed in my case. I'm actually using the python version that gopsutil is based on - same thing.
I think option 2 suggested by Tim is the right solution - add dmidecode
calls to gopsutil
. Unless there's something cool I'm not understanding about /dev/mem
.
The remaining problem for me, that I'd like to address, is that the nomad client reports 0 MHz utilization all the time. I'm not sure if that's a podman driver problem or an issue with the nomad client? All my jobs are podman driver based.
Looks like this has been discussed on and off for a while in gopsutil and here...
@schmichael had proposed the change to gopsutil back in 2017 :D https://github.com/shirou/gopsutil/issues/282
@shoenig seems to agree it's not worth nomad supporting arm64 detection in this duplicate issue: https://github.com/hashicorp/nomad/issues/14055
I wouldn't mind trying to update the lookup table and adding some dmidecode support to gopsutil if that seems reasonable.
Surprisingly, I really did mean reading /dev/mem
! Because as far as I can tell that's actually where dmidecode
is reading from. It even has an arg to use a different path for that file. (ref man(8) dmidecode
). But to do that you're definitely getting into the deep magic bits :grinning:
I think option 2 suggested by Tim is the right solution - add
dmidecode
calls togopsutil
That seems totally reasonable.
The remaining problem for me, that I'd like to address, is that the nomad client reports 0 MHz utilization all the time. I'm not sure if that's a podman driver problem or an issue with the nomad client? All my jobs are podman driver based.
The client gets stats from the driver via the TaskStats
API, so that's most likely an issue with the driver (or podman itself, as that's where the driver probably gets stats!). Would you be up for opening an issue in the podman repo?
Surprisingly, I really did mean reading
/dev/mem
! Because as far as I can tell that's actually wheredmidecode
is reading from. It even has an arg to use a different path for that file. (refman(8) dmidecode
). But to do that you're definitely getting into the deep magic bits 😀Interesting! I will take a look and see if I have enough magic bits leftover...
I think option 2 suggested by Tim is the right solution - add
dmidecode
calls togopsutil
That seems totally reasonable.
The remaining problem for me, that I'd like to address, is that the nomad client reports 0 MHz utilization all the time. I'm not sure if that's a podman driver problem or an issue with the nomad client? All my jobs are podman driver based.
The client gets stats from the driver via the
TaskStats
API, so that's most likely an issue with the driver (or podman itself, as that's where the driver probably gets stats!). Would you be up for opening an issue in the podman repo?
Thanks for the insight and direction - I created a new issue.
It seems like someone over at digital ocean tried get SMBIOS info out of /dev/mem
in native golang.
https://blog.gopheracademy.com/advent-2017/accessing-smbios-information-with-go/
The resulting package is old but appears to do some of the heavy lifting. https://github.com/digitalocean/go-smbios/blob/master/smbios/decoder.go
Not sure if there is appetite for hashi to keep it alive?
The remaining problem for me, that I'd like to address, is that the nomad client reports 0 MHz utilization all the time. I'm not sure if that's a podman driver problem or an issue with the nomad client?
This isn't specific to podman or nomad, it's that the information is not reported by the ARM kernel driver - you need this patch, or one like it. Maybe things have changed recently - in which case we should get the gopsutil library updated.
If it's an AWS Graviton instance then #16417 should pick it up.
I think I'd rather shell out to a known quantity like dmidecode
rather than parse /dev/mem
ourselves, but clearly I don't know much about the implementation details of that!
I think I'd rather shell out to a known quantity like
dmidecode
rather than parse/dev/mem
ourselves, but clearly I don't know much about the implementation details of that!
That's a good point -- the maintainers of dmidecode
are going to stay on top of any changes to the layout of that data way more readily than the gopsutil project will be able to.
dmidecode
fingerprinting was implemented in https://github.com/hashicorp/nomad/pull/18146, which will ship with Nomad 1.7.0.
ARM chipsets sparsely populate
/proc/cpuinfo
and often causecpu_total_compute
fingerprinting to fail.dmidecode
is a viable fallback when/proc/cpuinfo
does not contain the necessary information:See https://github.com/hashicorp/nomad/issues/2638#issuecomment-385239401 for details and thanks to @balupton for the suggestion!