hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.94k stars 1.96k forks source link

CPU fingerprinting with dmidecode fallback could be suboptimal #19468

Open MorphBonehunter opened 11 months ago

MorphBonehunter commented 11 months ago

Nomad version

Output from nomad version

Nomad v1.7.1
BuildDate 2023-12-08T18:11:21Z
Revision 608e719430038cdeb5fe108536d90cf88a8540e3

Operating system and Environment details

Archlinux on amd64, virtualized KVM/QEMU

Issue

As suggested from @shoenig in #19412 this is an extra issue for the new dmidecode fallback for CPU fingerprinting starting with 1.7. The usage could lead to suboptimal MHz readings.

Reproduction steps

In virtual environments, at least KVM with qemu virtual CPU Models, which may be default selected for new VMs in different orchestrators, the dmidecode exposes "wrong" MHz while lscpu / /proc/cpuinfo do expose much better readings. I have access for example to environments managed with Proxmox, oVirt and Virtuozzo and there these CPUs are use (compatibility stuff).

Maybe the following example is an edge case, but this shows how the dmidecode readings could lead to more Mhz than the host could provide.

In my homelab environment i have an (realy old) Intel(R) Celeron(R) CPU 847 @ 1.10GHz with two cores. This system use QEMU/KVM to host some VMs with emulated CPU "Intel Xeon E312xx (Sandy Bridge, IBRS update)" on which the MHz detection was ok (2195 MHz) before the upgrade to 1.7.1. After the upgrade the Frequency was 0 MHz until i install dmidecode as suggested in #19412.

Now I have 2x 2000 MHz because of the dmidecode reporting (only the relevant Information):

Processor Information
        Socket Designation: CPU 0
        Type: Central Processor
        Family: Other
        Manufacturer: QEMU
        Version: pc-q35-4.0
        Max Speed: 2000 MHz
        Current Speed: 2000 MHz
        Core Count: 1
        Thread Count: 1

Processor Information
        Socket Designation: CPU 1
        Type: Central Processor
        Family: Other
        Manufacturer: QEMU
        Version: pc-q35-4.0
        Max Speed: 2000 MHz
        Current Speed: 2000 MHz
        Core Count: 1
        Thread Count: 1

The lscpu -Je=MHZ and grep -i mhz /proc/cpuinfo still shows the real numbers:

{
   "cpus": [
      {
         "mhz": 1097,5060
      },{
         "mhz": 1097,5060
      }
   ]
}

cpu MHz         : 1097.506
cpu MHz         : 1097.506

So in this Case the detected MHz per dmidecode is nearly 2 times what the host real provides.

Expected Result

The Mhz should be detected around 2195.

Actual Result

The MHz is detected as 4000.

roman-vynar commented 11 months ago

A follow up feedback. Looks like 1.7.2 works fine now, however there is a slight difference in the http call response to read a node attributes per https://developer.hashicorp.com/nomad/api-docs/nodes#read-node

1.3.5 vs 1.7.2

Screenshot 2023-12-14 at 17 21 40

As we can see it reports zeros under Resources and my scripts relied on that. However, it is still okay under NodeResources so I will use that.

P.S. dmidecode is not needed.

FibreFoX commented 11 months ago

Just my 2 cents: maybe I'm just blind, but demidecode does not give any data on Raspberry Pi (at least Raspbian).

MorphBonehunter commented 11 months ago

After upgrading to 1.7.2 the detection is working again BUT only after uninstalling the dmidecode as this is used before the fallback. Besides my homelab, this also affects my production systems (also on a hoster which use Virtuozzo/qemu) and produce wrong Mhz as the CPU is a simulated "AMD EPYC Processor (with IBPB)" and dmidecode shows:

Processor Information
        Socket Designation: CPU 0
        Type: Central Processor
        Family: Other
        Manufacturer: Virtuozzo
        Max Speed: 2000 MHz
        Current Speed: 2000 MHz
        Status: Populated, Enabled
        Core Count: 6
        Core Enabled: 6
        Thread Count: 1

So i got for this system an total compute of 12000 MHz with dmidecode installed. (I can not verify this, but I get the impression that on every simulated CPU the MHz in the DMI is hardcoded to 2000Mhz.)

So the real numbers are:

~ $ lscpu -Je=MHZ
{
   "cpus": [
      {
         "mhz": 2794,7500
      },{
         "mhz": 2794,7500
      },{
         "mhz": 2794,7500
      },{
         "mhz": 2794,7500
      },{
         "mhz": 2794,7500
      },{
         "mhz": 2794,7500
      }
   ]
}
~ $ grep -i mhz /proc/cpuinfo
cpu MHz         : 2794.750
cpu MHz         : 2794.750
cpu MHz         : 2794.750
cpu MHz         : 2794.750
cpu MHz         : 2794.750
cpu MHz         : 2794.750

So total compute is 16764 Mhz which is around 39% more than nomad detects from DMI.

shoenig commented 10 months ago

the dmidecode exposes "wrong" MHz while lscpu / /proc/cpuinfo do expose much better readings.

So the problem we have now is that in some cases /proc/cpuinfo contains better information, while in other cases dmidecode produces better information. For example in EC2 /proc/cpuinfo produces only live frequencies which could be like 800 Mhz at idle or turbo boosted for a brief instant. But from the random sampling of instance types I tried, the dmidecode values set the "Current Speed" to the true base speed which is what we actually want.

I'm not sure what the best thing is to do here. The workaround described above where you choose whether or not to have dmidecode installed isn't ideal; maybe it would be better to have a client configuration value that indicates which fallback method to try first.

I do think the behavior we have now - use dmidecode information if available - is the best default since running in EC2 is very, very common for our users.

FibreFoX commented 10 months ago

+1 for the client configuration 👍 because it gives more control to the whole situation, but please do not "just" focus on EC2, not everyone is looking into Nomad to run there ;) especially when creating self-hosted environments, this would reduce some pain points

digi-talo commented 4 months ago

Another +1 for the client configuration, we ran into capacity issues when upgrading from 1.6.x to 1.7.7. We found that dmidecode provided inaccurate values both for virtual machines running on proxmox as well as on bare-metal. Unfortunately removing dmidecode is no feasible workaround as a lot of other packages have dependencies on it.

digi-talo commented 4 months ago

@shoenig Is there any outlook on when we can expect a client configuration option? This is really an issue for us and I'm sure for others who are not running on EC2. Uninstalling dmidecode is no option as it breaks other packages.

mvegter commented 1 week ago

For us , the 2000 MHz seems to show up from KVM/QEMU : https://gitlab.com/qemu-project/qemu/-/blob/62f182c97b31445012d37181005a83ff8453edaa/hw/smbios/smbios.c#L66-84