Closed herter4171 closed 4 years ago
Hi @herter4171 and thanks for the detail in this issue. In order to help diagnose this problem would you be able to provide the output of the following two commands from a couple of the instances where you are seeing this behaviour?
cat /proc/cpuinfo |grep 'cpu MHz'
cat /proc/cpuinfo |grep 'cpu cores'
Seems like parsing cpu MHz
out of /proc/cpuinfo
is only going to get us current clockspeed, which could vary widely given power states, etc. Has nomad always determined clock speed this way? We should be getting the rated speed, instead. e.g.
$ lscpu | grep MHz
CPU MHz: 3899.997
CPU max MHz: 4700.0000
CPU min MHz: 400.0000
Hi @jrasell, thank you for the response! Output for those grep
commands are a bit lengthy due to there being 96 cores. Here is some truncated output.
For the first instance,
$ cat /proc/cpuinfo | grep 'cpu MHz' | head -n 1
cpu MHz : 1843.994
$ cat /proc/cpuinfo |grep 'cpu cores' | head -n 1
cpu cores : 24
For the second instance,
$ cat /proc/cpuinfo | grep 'cpu MHz' | head -n 1
cpu MHz : 1677.167
$ cat /proc/cpuinfo |grep 'cpu cores' | head -n 1
cpu cores : 24
For the third instance,
$ cat /proc/cpuinfo | grep 'cpu MHz' | head -n 1
cpu MHz : 1506.577
$ cat /proc/cpuinfo |grep 'cpu cores' | head -n 1
cpu cores : 24
Givne the nproc
output, I'm guessing the "cpu cores" output of 24 implies there are four physical processors.
$ nproc
96
Nomad uses gopsutil.cpu.InfoStat
to get the CPU MHz, and by default, it uses /sys/devices/system/cpu/cpuN/cpufreq/cpuinfo_max_freq
on Linux, to determine the maximum frequency of the CPU see. But it will fall back on value from /proc/cpuinfo
if that failed. You should check that sysfs
path on your VMs, @herter4171.
Hi @dvusboy, the lay of the land is that I'm using Amazon Linux 2 pretty much out of the box. That platform has /sys/devices/system/cpu
, and from there it's cpu0
and so on. The subdirectories for cpu*
don't have a cpufreq
directory, so I'm not sure how to proceed. Is there something I can do to populate that? This seems like a pretty major detail for supporting Nomad on Amazon Linux 2, and I'd like to avoid switching distros.
@herter4171 By cpuN
, I meant, substituting N
with some non-negative integer. Since cpufreq
is not there, I'd say you don't have access to the actual maximum frequency, and gopsutil
defaults to MHz
out of cpuinfo
, which is the current frequency. It would explain what you're seeing.
@dvusboy, I latently picked up on that and edited my last comment accordingly. Can I do something to make Amazon Linux 2 play ball for Nomad, or can something be done on the Nomad side of things to fix this? One idea I have is spawning yes > /dev/null &
for all but one core before launching the Nomad client to make Nomad recognize actual MHz, but I'd really appreciate some support for the given platform. I can't be the only guy running Nomad on Amazon Linux 2, after all.
I suppose you can use cpu_total_compute in the client configuration to override the fingerprinted values.
@dvusboy, I'm aware of that option, and I don't think it addresses the core issue. Nomad should be capable enough to set available MHz.
Hi @dvusboy and company, after rooting around a bit, I can see the difficulty in getting rated clock speed on Amazon Linux 2 without assumed access to sudo
. In case it helps on your end, what I've put in place for initializing a Nomad client is as follows.
# Get max rated core speed
CORE_MAX_MHZ=$(sudo dmidecode processor-frequency \
| grep '^\s*Max Speed' \
| head -n 1 \
| awk '{print $3}')
# Multiply by number of cores to get total MHz
TOTAL_MHZ=$((CORE_MAX_MHZ*`nproc`))
I'd still like to see this functionality become native instead of depending on my hacky Bash, but I'm equipped to move on if there's not interest in pursuing this. Thanks for the help so far.
Hey @shoenig, I'm having a bit of additional difficulty in spite of my fix. Even though I've set the client stanza like I described and verified the updated value is reflected in Nomad, jobs still fail to be placed due to this other hidden limit shown in my screenshot. I'm a bit confused, because 262144 MHz / 96 cores = 2.73 GHz/core
, and that's above the rated speed of 2.5 GHz and well below the max of 3.5 GHz.
I'd hope to be able to move on with things, but this is still holding things back, I'm afraid.
I'm thinking this is actually a problem on all EC2 instances, not just Linxu2
. On an Ubuntu micro:
ubuntu@ip-172-31-82-121:~$ cpupower frequency-info
analyzing CPU 0:
no or unknown cpufreq driver is active on this CPU
CPUs which run at the same hardware frequency: Not Available
CPUs which need to have their frequency coordinated by software: Not Available
maximum transition latency: Cannot determine or is not supported.
Not Available
available cpufreq governors: Not Available
Unable to determine current policy
current CPU frequency: Unable to call hardware
current CPU frequency: Unable to call to kernel
boost state support:
Supported: no
Active: no
ubuntu@ip-172-31-82-121:~$ # there is no cpufreq/cpuinfo_max_freq
ubuntu@ip-172-31-82-121:~$ ls /sys/devices/system/cpu/cpu0
cache crash_notes crash_notes_size driver firmware_node hotplug node0 power subsystem topology uevent
ubuntu@ip-172-31-82-121:~$ ls /sys/devices/system/cpu/cpufreq # empty
If there's any good news, the CPU cgroup management seems unaffected
Allocated Resources
CPU Memory Disk
250/2400 MHz 32 MiB/983 MiB 300 MiB/6.6 GiB
Allocation Resource Utilization
CPU Memory
0/2400 MHz 388 KiB/983 MiB
Host Resource Utilization
CPU Memory Disk
2400/2400 MHz 146 MiB/983 MiB 1.4 GiB/8.0 GiB # loaded deliberately
[ec2-user@ip-172-31-94-218 proc]$ cat /proc/cgroups
#subsys_name hierarchy num_cgroups enabled
cpuset 11 3 1
cpu 9 3 1
cpuacct 9 3 1
blkio 10 3 1
memory 6 3 1
devices 5 25 1
freezer 4 3 1
net_cls 2 3 1
perf_event 8 3 1
net_prio 2 3 1
hugetlb 7 3 1
pids 3 3 1
I'm going to keep researching and asking around, but I suspect this may boil down to parsing the rated CPU speed out of the CPU model name
string. Hacky as that may be, it should be more accurate than parsing cpu MHz
, which is tantamount to using a random number.
Hey @shoenig, thanks for the digging. One thing about using model name
I've noticed is that certain instance types, like "memory optimized," use AMD chips that don't have the rated frequency in the name like Intel procs tends to. Also, I think the driver error I'm seeing in the pic from my last comment is related to this issue, since it's requiring a value for MHz between rated and max. I'd be happy to open a separate thread for that if it's going to muddy waters here, though.
Another possibility might be to modify gopsutil
to briefly load a single CPU thread and take measurements of the current speed, the maximum of which would be presumed to be the max CPU speed.
I put together a quick demo to check if this works, before submitting the idea upstream
$ for i in {1..10}; do ./loadcpu && sleep 3 && echo ""; done
read current speed: 800.04
loaded max speed: 3900.70
read current speed: 1924.65
loaded max speed: 3901.08
read current speed: 1495.16
loaded max speed: 3900.33
read current speed: 2826.81
loaded max speed: 3900.00
read current speed: 3400.18
loaded max speed: 3902.43
read current speed: 1979.91
loaded max speed: 3900.95
read current speed: 2627.13
loaded max speed: 3900.19
read current speed: 889.96
loaded max speed: 3901.62
read current speed: 3391.65
loaded max speed: 3902.97
read current speed: 906.17
loaded max speed: 3900.63
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Nomad version
Nomad v0.10.4 (f750636ca68e17dcd2445c1ab9c5a34f9ac69345)
Operating system and Environment details
Amazon Linux 2 with a fixed head node and an auto-scaling group of
c5.24xlarge
instances, with scaling driven by Nomad state using a custom cloud metric.Issue
The number of MHz available on a node varies wildly. For the exact same instance type (96 cores, 3 GHz stock, 3.9 GHz max), I'm seeing as low as 1.6E5 MHz all the way up to 3.4E5 MHz. Just now, I've launched 3
c5.24xlarge
nodes, and their max MHz areI'd rather not hard-wire
cpu_total_compute
in the client config, and everything else I've read claims Nomad sets the MHz based on core count multiplied by rated clock speed rather than current.Having MHz vary like this causes jobs to not be placed, even when the node actually has the capacity. Would a short-term fix be forcing all but one core to 100%, launching the Nomad client, and taking the load off of CPU? The docs I've read claim Nomad uses stock clock speed, so I'm kind of at a loss here.
Reproduction steps
Launch a few instances of the same type with the Nomad client running on boot (I'm using
systemctl
). Rated MHz for each client in the web UI should vary appreciably.