Open MorphBonehunter opened 11 months ago
A follow up feedback. Looks like 1.7.2 works fine now, however there is a slight difference in the http call response to read a node attributes per https://developer.hashicorp.com/nomad/api-docs/nodes#read-node
1.3.5 vs 1.7.2
As we can see it reports zeros under Resources
and my scripts relied on that.
However, it is still okay under NodeResources
so I will use that.
P.S. dmidecode is not needed.
Just my 2 cents: maybe I'm just blind, but demidecode
does not give any data on Raspberry Pi (at least Raspbian).
After upgrading to 1.7.2 the detection is working again BUT only after uninstalling the dmidecode
as this is used before the fallback.
Besides my homelab, this also affects my production systems (also on a hoster which use Virtuozzo/qemu) and produce wrong Mhz as the CPU is a simulated "AMD EPYC Processor (with IBPB)" and dmidecode
shows:
Processor Information
Socket Designation: CPU 0
Type: Central Processor
Family: Other
Manufacturer: Virtuozzo
Max Speed: 2000 MHz
Current Speed: 2000 MHz
Status: Populated, Enabled
Core Count: 6
Core Enabled: 6
Thread Count: 1
So i got for this system an total compute of 12000 MHz with dmidecode
installed.
(I can not verify this, but I get the impression that on every simulated CPU the MHz in the DMI is hardcoded to 2000Mhz.)
So the real numbers are:
~ $ lscpu -Je=MHZ
{
"cpus": [
{
"mhz": 2794,7500
},{
"mhz": 2794,7500
},{
"mhz": 2794,7500
},{
"mhz": 2794,7500
},{
"mhz": 2794,7500
},{
"mhz": 2794,7500
}
]
}
~ $ grep -i mhz /proc/cpuinfo
cpu MHz : 2794.750
cpu MHz : 2794.750
cpu MHz : 2794.750
cpu MHz : 2794.750
cpu MHz : 2794.750
cpu MHz : 2794.750
So total compute is 16764 Mhz which is around 39% more than nomad detects from DMI.
the dmidecode exposes "wrong" MHz while lscpu / /proc/cpuinfo do expose much better readings.
So the problem we have now is that in some cases /proc/cpuinfo
contains better information, while in other cases dmidecode
produces better information. For example in EC2 /proc/cpuinfo
produces only live frequencies which could be like 800 Mhz at idle or turbo boosted for a brief instant. But from the random sampling of instance types I tried, the dmidecode values set the "Current Speed" to the true base speed which is what we actually want.
I'm not sure what the best thing is to do here. The workaround described above where you choose whether or not to have dmidecode
installed isn't ideal; maybe it would be better to have a client configuration value that indicates which fallback method to try first.
I do think the behavior we have now - use dmidecode information if available - is the best default since running in EC2 is very, very common for our users.
+1 for the client configuration 👍 because it gives more control to the whole situation, but please do not "just" focus on EC2, not everyone is looking into Nomad to run there ;) especially when creating self-hosted environments, this would reduce some pain points
Another +1 for the client configuration, we ran into capacity issues when upgrading from 1.6.x to 1.7.7. We found that dmidecode provided inaccurate values both for virtual machines running on proxmox as well as on bare-metal. Unfortunately removing dmidecode is no feasible workaround as a lot of other packages have dependencies on it.
@shoenig Is there any outlook on when we can expect a client configuration option? This is really an issue for us and I'm sure for others who are not running on EC2. Uninstalling dmidecode is no option as it breaks other packages.
For us , the 2000 MHz seems to show up from KVM/QEMU : https://gitlab.com/qemu-project/qemu/-/blob/62f182c97b31445012d37181005a83ff8453edaa/hw/smbios/smbios.c#L66-84
Nomad version
Output from
nomad version
Operating system and Environment details
Archlinux on amd64, virtualized KVM/QEMU
Issue
As suggested from @shoenig in #19412 this is an extra issue for the new
dmidecode
fallback for CPU fingerprinting starting with 1.7. The usage could lead to suboptimal MHz readings.Reproduction steps
In virtual environments, at least KVM with qemu virtual CPU Models, which may be default selected for new VMs in different orchestrators, the dmidecode exposes "wrong" MHz while
lscpu
//proc/cpuinfo
do expose much better readings. I have access for example to environments managed with Proxmox, oVirt and Virtuozzo and there these CPUs are use (compatibility stuff).Maybe the following example is an edge case, but this shows how the
dmidecode
readings could lead to more Mhz than the host could provide.In my homelab environment i have an (realy old) Intel(R) Celeron(R) CPU 847 @ 1.10GHz with two cores. This system use QEMU/KVM to host some VMs with emulated CPU "Intel Xeon E312xx (Sandy Bridge, IBRS update)" on which the MHz detection was ok (2195 MHz) before the upgrade to 1.7.1. After the upgrade the Frequency was 0 MHz until i install dmidecode as suggested in #19412.
Now I have 2x 2000 MHz because of the
dmidecode
reporting (only the relevant Information):The
lscpu -Je=MHZ
andgrep -i mhz /proc/cpuinfo
still shows the real numbers:So in this Case the detected MHz per
dmidecode
is nearly 2 times what the host real provides.Expected Result
The Mhz should be detected around 2195.
Actual Result
The MHz is detected as 4000.