hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.94k stars 1.96k forks source link

Use dmidecode as fallback source of cpu_total_compute #4233

Closed schmichael closed 1 year ago

schmichael commented 6 years ago

ARM chipsets sparsely populate /proc/cpuinfo and often cause cpu_total_compute fingerprinting to fail.

dmidecode is a viable fallback when /proc/cpuinfo does not contain the necessary information:

$ dmidecode -t 4

# dmidecode 3.0
Getting SMBIOS data from sysfs.
SMBIOS 3.0.0 present.

Handle 0x0400, DMI type 4, 42 bytes
Processor Information
    Socket Designation: CPU 0
    Type: Central Processor
    Family: Other
    Manufacturer: QEMU
    ID: 00 00 00 00 00 00 00 00
    Version: 1.0
    Voltage: Unknown
    External Clock: Unknown
    Max Speed: 2000 MHz
    Current Speed: 2000 MHz
    Status: Populated, Enabled
    Upgrade: Other
    L1 Cache Handle: Not Provided
    L2 Cache Handle: Not Provided
    L3 Cache Handle: Not Provided
    Serial Number: Not Specified
    Asset Tag: Not Specified
    Part Number: Not Specified
    Core Count: 1
    Core Enabled: 1
    Thread Count: 1
    Characteristics: None

...

See https://github.com/hashicorp/nomad/issues/2638#issuecomment-385239401 for details and thanks to @balupton for the suggestion!

Legogris commented 4 years ago

Is anyone working on this?

angrycub commented 4 years ago

Note that you can work around this issue by using the cpu_total_compute configuration for compute elements that miscalculate. This will override the fingerprinter in cases where it can't calculate properly.

shantanugadgil commented 3 years ago

Has this made it into 1.0.0 GA ? I assume not, as I am hitting this on an aarch64 VM of CentOS 7. (setting the cpu_total_compute value works, but 😞 )

tgross commented 3 years ago

Hi @shantanugadgil. No this didn't land in 1.0

roylez commented 3 years ago

Another problem is that for arm cpu usage metrics simply does not work. Container cpu usage remains 0 MHz all the time, making setting cpu_total_compute useless as scheduler always gets zero.

courtland commented 1 year ago

FWIW I'm also running into this issue on AWS Graviton 3 Nitro instances (aarch64), so I'm also falling back to manually setting cpu_total_compute based on dmidecode. But as @roylez mentioned, CPU usage on the client always reports 0. Kind of a bummer after putting a lot of effort into making our workloads arm friendly :(

I'm thinking about taking a shot at a PR for this unless anyone else is already working on this? Any suggestions on most reliable source for current CPU freq on arm?

Nomad v1.4.3

tgross commented 1 year ago

Hi @courtland! We'd love to review a PR for this. Our code is in helper/stats/cpu.go but the real work to be done here may actually be in github.com/shirou/gopsutil/v3/cpu, which we use as the library to read CPU info. As @schmichael noted above, dmidecode seems to be the reasonable fallback.

courtland commented 1 year ago

@tgross thanks for the hints!

Is there a preference to adding a new golang package for dmidecode parsing vs. exec within the client and doing it "manually"? Is there some other area where the nomad client optionally relies on a userland binary to exist? I'm assuming we would have to recommend in the docs that anyone running aarch64 should install dmidecode separately.

tgross commented 1 year ago

Is there a preference to adding a new golang package for dmidecode parsing vs. exec within the client and doing it "manually"?

I'd order our preference for a solution as follows:

  1. Add the support for reading the required values out of /dev/mem to shirou/gopsutil (but I also recognize that's a heck of a lift :grinning: )
  2. Add a call to the dmidecode binary to shirou/gopsutil
  3. Add a call to the dmidecode binary to Nomad

Is there some other area where the nomad client optionally relies on a userland binary to exist? I'm assuming we would have to recommend in the docs that anyone running aarch64 should install dmidecode separately.

A couple bits of the Nomad client have (undocumented :grimacing:) dependencies on coreutils binaries on Unixish hosts, but otherwise I think only CNI requires one. So long as we document the requirement and have a safe fallback if it's not installed, I think we'll be ok.

schmichael commented 1 year ago

+1 to Tim's list (although I think he meant /dev/cpuinfo and not /dev/mem... parsing /dev/mem would be very exciting), but just to throw out another option that might compose well with other fallbacks:

I think a lookup table might be a reasonable approach as well as that avoids the problem of having to find the max frequency supported and not whatever frequency the chip is currently at as part of power/thermal management. The big downside of lookup tables is that they're impossible to test without access to that hardware, so we'd have to rely on contributions.

For example we have a big AWS EC2 lookup table here: https://github.com/hashicorp/nomad/blob/main/client/fingerprint/env_aws_cpu.go

Generated by make ec2info and backported.

courtland commented 1 year ago

Hah, yeah, gotta be in /dev/mem somewhere, right, maybe... I'm assuming you both meant /proc/cpuinfo ? Unfortunately, at least on my m7g, it just has BogoMIPS : 2100.00 with some other cpu features. The actual max speed is 2600MHz according to dmidecode.

The lookup table approach is alright to get the max freq, especially since you're already doing that. Actually, in my case, it's simply just missing the latest M7g instance types. I think you're saying that requires someone to go and run make ec2info manually and merge the result?

I am successfully using dmidecode to know the max frequency. Personally I'd prefer if nomad worked on any arm64 system. The python and go psutil libraries both fail at detecting CPU speed in my case. I'm actually using the python version that gopsutil is based on - same thing.

I think option 2 suggested by Tim is the right solution - add dmidecode calls to gopsutil. Unless there's something cool I'm not understanding about /dev/mem.

The remaining problem for me, that I'd like to address, is that the nomad client reports 0 MHz utilization all the time. I'm not sure if that's a podman driver problem or an issue with the nomad client? All my jobs are podman driver based.

courtland commented 1 year ago

Looks like this has been discussed on and off for a while in gopsutil and here...

@schmichael had proposed the change to gopsutil back in 2017 :D https://github.com/shirou/gopsutil/issues/282

@shoenig seems to agree it's not worth nomad supporting arm64 detection in this duplicate issue: https://github.com/hashicorp/nomad/issues/14055

I wouldn't mind trying to update the lookup table and adding some dmidecode support to gopsutil if that seems reasonable.

tgross commented 1 year ago

Surprisingly, I really did mean reading /dev/mem! Because as far as I can tell that's actually where dmidecode is reading from. It even has an arg to use a different path for that file. (ref man(8) dmidecode). But to do that you're definitely getting into the deep magic bits :grinning:

I think option 2 suggested by Tim is the right solution - add dmidecode calls to gopsutil

That seems totally reasonable.

The remaining problem for me, that I'd like to address, is that the nomad client reports 0 MHz utilization all the time. I'm not sure if that's a podman driver problem or an issue with the nomad client? All my jobs are podman driver based.

The client gets stats from the driver via the TaskStats API, so that's most likely an issue with the driver (or podman itself, as that's where the driver probably gets stats!). Would you be up for opening an issue in the podman repo?

courtland commented 1 year ago

Surprisingly, I really did mean reading /dev/mem! Because as far as I can tell that's actually where dmidecode is reading from. It even has an arg to use a different path for that file. (ref man(8) dmidecode). But to do that you're definitely getting into the deep magic bits 😀

Interesting! I will take a look and see if I have enough magic bits leftover...

I think option 2 suggested by Tim is the right solution - add dmidecode calls to gopsutil

That seems totally reasonable.

The remaining problem for me, that I'd like to address, is that the nomad client reports 0 MHz utilization all the time. I'm not sure if that's a podman driver problem or an issue with the nomad client? All my jobs are podman driver based.

The client gets stats from the driver via the TaskStats API, so that's most likely an issue with the driver (or podman itself, as that's where the driver probably gets stats!). Would you be up for opening an issue in the podman repo?

Thanks for the insight and direction - I created a new issue.

courtland commented 1 year ago

It seems like someone over at digital ocean tried get SMBIOS info out of /dev/mem in native golang.

https://blog.gopheracademy.com/advent-2017/accessing-smbios-information-with-go/

The resulting package is old but appears to do some of the heavy lifting. https://github.com/digitalocean/go-smbios/blob/master/smbios/decoder.go

Not sure if there is appetite for hashi to keep it alive?

shoenig commented 1 year ago

The remaining problem for me, that I'd like to address, is that the nomad client reports 0 MHz utilization all the time. I'm not sure if that's a podman driver problem or an issue with the nomad client?

This isn't specific to podman or nomad, it's that the information is not reported by the ARM kernel driver - you need this patch, or one like it. Maybe things have changed recently - in which case we should get the gopsutil library updated.

https://patchwork.kernel.org/project/linux-arm-kernel/patch/1386924222-23169-1-git-send-email-vkale@apm.com/

schmichael commented 1 year ago

If it's an AWS Graviton instance then #16417 should pick it up.

I think I'd rather shell out to a known quantity like dmidecode rather than parse /dev/mem ourselves, but clearly I don't know much about the implementation details of that!

tgross commented 1 year ago

I think I'd rather shell out to a known quantity like dmidecode rather than parse /dev/mem ourselves, but clearly I don't know much about the implementation details of that!

That's a good point -- the maintainers of dmidecode are going to stay on top of any changes to the layout of that data way more readily than the gopsutil project will be able to.

lgfa29 commented 1 year ago

dmidecode fingerprinting was implemented in https://github.com/hashicorp/nomad/pull/18146, which will ship with Nomad 1.7.0.