hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.87k stars 1.95k forks source link

fingerprinting OVH nodes with incorrect CPU frequency after upgrading to 1.7.1 #19406

Closed kevinschoonover closed 10 months ago

kevinschoonover commented 10 months ago

Nomad version

> nomad version
Nomad v1.7.1
BuildDate 2023-12-08T18:11:21Z
Revision 608e719430038cdeb5fe108536d90cf88a8540e3

Operating system and Environment details

ovh VPS with the following configuration:

> uname -a
Linux vps-9e8a4a7f 5.10.0-26-cloud-amd64 #1 SMP Debian 5.10.197-1 (2023-09-29) x86_64 GNU/Linux

Issue

After upgrading to 1.7.1, the OVH nodes in my nomad cluster report 0 MHZ fingerprinted CPU; however, if you look at the logs below you see that it detects 8 CPUs just not the clock speed for them. missing_cpu

I have another node in hetzner that it is able to properly detect the CPU frequency for. Downgrading to nomad 1.6.4 and restarting resolves then problem.

Reproduction steps

Start nomad client on a OVH node and have it join the cluster

Nomad Client logs (if appropriate)

Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]: ==> Nomad agent configuration:
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:        Advertise Addrs: HTTP: 100.101.109.40:4646
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:             Bind Addrs: HTTP: [100.101.109.40:4646 127.0.0.1:4646]
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:                 Client: true
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:              Log Level: DEBUG
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:                 Region: global (DC: ovh-us-west-or-fed)
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:                 Server: false
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:                Version: 1.7.1
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]: ==> Nomad agent started! Log data will stream in below:
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.465Z [WARN]  agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/var/nomad/plugins
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.476Z [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=/var/nomad/plugins
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.477Z [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.477Z [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.477Z [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.477Z [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.477Z [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.478Z [INFO]  client: using state directory: state_dir=/var/nomad/client
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.479Z [INFO]  client: using alloc directory: alloc_dir=/var/nomad/alloc
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.479Z [INFO]  client: using dynamic ports: min=20000 max=32000 reserved=""
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.547Z [DEBUG] client.fingerprint_mgr: built-in fingerprints: fingerprinters=["arch", "bridge", "cgroup", "cni", "consul", "cpu", "host", "landlock", "memory", "network", "nomad", "plugins_cni", "signal", "storage", "vault", "env_digitalocean", "env_aws", "env_gce", "env_azure"]
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.547Z [DEBUG] client.fingerprint_mgr.cgroup: detected cgroups: version=2
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.547Z [DEBUG] client.fingerprint_mgr: CNI config dir is not set or does not exist, skipping: cni_config_dir=/opt/cni/config
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.559Z [INFO]  client.fingerprint_mgr.consul: consul agent is available: cluster=default
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.559Z [DEBUG] client.fingerprint_mgr: fingerprinting periodically: fingerprinter=consul initial_period=52.968832938s
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.560Z [DEBUG] client.fingerprint_mgr.cpu: detected CPU model: name="Intel Core Processor (Haswell, no TSX)"
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.560Z [DEBUG] client.fingerprint_mgr.cpu: detected CPU efficiency core count: cores=8
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.560Z [DEBUG] client.fingerprint_mgr.cpu: detected CPU performance core count: cores=0
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.560Z [DEBUG] client.fingerprint_mgr.cpu: detected CPU core count: cores=8
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.613Z [WARN]  client.fingerprint_mgr.landlock: failed to fingerprint kernel landlock feature: error="function not implemented"
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.617Z [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=tailscale0
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.617Z [DEBUG] client.fingerprint_mgr.network: unable to parse link speed: path=/sys/class/net/tailscale0/speed device=tailscale0
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.617Z [DEBUG] client.fingerprint_mgr.network: link speed could not be detected and no speed specified by user, falling back to default speed: interface=tailscale0 mbits=1000
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.617Z [DEBUG] client.fingerprint_mgr.network: detected interface IP: interface=tailscale0 IP=100.101.109.40
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.617Z [DEBUG] client.fingerprint_mgr.network: detected interface IP: interface=tailscale0 IP=fd7a:115c:a1e0:ab12:4843:cd96:6265:6d28
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.619Z [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=lo
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.619Z [DEBUG] client.fingerprint_mgr.network: unable to read link speed: path=/sys/class/net/lo/speed device=lo
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.619Z [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: interface=lo mbits=1000
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.621Z [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=ens3
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.622Z [DEBUG] client.fingerprint_mgr.network: unable to parse link speed: path=/sys/class/net/ens3/speed device=ens3
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.622Z [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: interface=ens3 mbits=1000
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.625Z [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=tailscale0
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.625Z [DEBUG] client.fingerprint_mgr.network: unable to parse link speed: path=/sys/class/net/tailscale0/speed device=tailscale0
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.625Z [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: interface=tailscale0 mbits=1000
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.628Z [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=docker0
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.629Z [DEBUG] client.fingerprint_mgr.network: unable to parse link speed: path=/sys/class/net/docker0/speed device=docker0
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.629Z [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: interface=docker0 mbits=1000
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.712Z [INFO]  client.fingerprint_mgr.vault: Vault is available: cluster=default
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:12.712Z [DEBUG] client.fingerprint_mgr: fingerprinting periodically: fingerprinter=vault initial_period=51.127719075s
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.113Z [DEBUG] client.fingerprint_mgr.env_aws: read an empty value: attribute=public-ipv4
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.129Z [DEBUG] client.fingerprint_mgr.env_aws: read an empty value: attribute=mac
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.238Z [DEBUG] client.fingerprint_mgr.env_aws: read an empty value: attribute=instance-life-cycle
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.302Z [DEBUG] client.fingerprint_mgr.env_gce: could not read value for attribute: attribute=machine-type resp_code=404
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.355Z [DEBUG] client.fingerprint_mgr.env_azure: could not read value for attribute: attribute=compute/azEnvironment resp_code=404
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.370Z [DEBUG] client.fingerprint_mgr.env_digitalocean: could not read value for attribute: attribute=region resp_code=404
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.370Z [DEBUG] client.fingerprint_mgr: detected fingerprints: node_attrs=["arch", "bridge", "consul", "cpu", "host", "network", "nomad", "plugins_cni", "signal", "storage", "vault", "env_aws"]
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.370Z [INFO]  client.proclib.cg2: initializing nomad cgroups: cores=0-7
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.371Z [DEBUG] client.proclib.cg2: top level partition root nomad.slice cgroup initialized
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.371Z [DEBUG] client.proclib.cg2: partition member nomad.slice/share cgroup initialized
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.371Z [DEBUG] client.proclib.cg2: partition member nomad.slice/reserve cgroup initialized
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.371Z [INFO]  client.plugin: starting plugin manager: plugin-type=csi
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.371Z [INFO]  client.plugin: starting plugin manager: plugin-type=driver
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.371Z [INFO]  client.plugin: starting plugin manager: plugin-type=device
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.371Z [DEBUG] client.device_mgr: exiting since there are no device plugins
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.371Z [DEBUG] client.plugin: waiting on plugin manager initial fingerprint: plugin-type=driver
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.371Z [DEBUG] client.plugin: waiting on plugin manager initial fingerprint: plugin-type=device
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.371Z [DEBUG] client.plugin: finished plugin manager initial fingerprint: plugin-type=device
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.372Z [DEBUG] client.driver_mgr: initial driver fingerprint: driver=java health=undetected description=""
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.372Z [DEBUG] client.driver_mgr: initial driver fingerprint: driver=raw_exec health=undetected description=disabled
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.372Z [DEBUG] client.driver_mgr: initial driver fingerprint: driver=exec health=healthy description=Healthy
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.372Z [DEBUG] client.driver_mgr.docker: using client connection initialized from environment: driver=docker
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.372Z [DEBUG] client.driver_mgr: initial driver fingerprint: driver=qemu health=undetected description=""
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.374Z [DEBUG] client.consul: bootstrap contacting Consul DCs: consul_dcs=["ovh-us-west-or-fed", "hetzner-hil"]
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.379Z [INFO]  client.consul: discovered following servers: servers=[100.90.116.76:4647, 100.83.171.64:4647, 100.91.63.45:4647]
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.379Z [DEBUG] client.server_mgr: new server list: new_servers=[100.83.171.64:4647, 100.90.116.76:4647, 100.91.63.45:4647] old_servers=[]
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.396Z [DEBUG] client.driver_mgr: initial driver fingerprint: driver=docker health=healthy description=Healthy
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.396Z [DEBUG] client.driver_mgr: detected drivers: drivers="map[healthy:[exec docker] undetected:[java raw_exec qemu]]"
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.396Z [DEBUG] client.plugin: finished plugin manager initial fingerprint: plugin-type=driver
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.397Z [INFO]  client: started client: node_id=2e503af6-c37a-510a-a6be-8fa6e96d88b5
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.399Z [DEBUG] client: updated allocations: index=246 total=0 pulled=0 filtered=0
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.399Z [DEBUG] client: allocation updates: added=0 removed=0 updated=0 ignored=0
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.399Z [DEBUG] client: allocation updates applied: added=0 removed=0 updated=0 ignored=0 errors=0
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.400Z [DEBUG] http: UI is enabled
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.401Z [DEBUG] http: UI is enabled
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.401Z [DEBUG] http: UI is enabled
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:  client: node registration complete
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.407Z [INFO]  client: node registration complete
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.407Z [DEBUG] client: evaluations triggered by node registration: num_evals=1
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]: client: evaluations triggered by node registration: num_evals=1
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:14.415Z [DEBUG] consul.sync: sync complete: registered_services=1 deregistered_services=0 registered_checks=1 deregistered_checks=0
Dec 09 08:26:14 vps-9e8a4a7f nomad[2720]: consul.sync: sync complete: registered_services=1 deregistered_services=0 registered_checks=1 deregistered_checks=0
Dec 09 08:26:21 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:21.536Z [DEBUG] http: request complete: method=GET path=/v1/agent/health?type=client duration="599.388µs"
Dec 09 08:26:21 vps-9e8a4a7f nomad[2720]: http: request complete: method=GET path=/v1/agent/health?type=client duration="599.388µs"
Dec 09 08:26:23 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:23.411Z [DEBUG] client: state changed, updating node and re-registering
Dec 09 08:26:23 vps-9e8a4a7f nomad[2720]: client: state changed, updating node and re-registering
Dec 09 08:26:23 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:23.417Z [INFO]  client: node registration complete
Dec 09 08:26:23 vps-9e8a4a7f nomad[2720]:  client: node registration complete
Dec 09 08:26:31 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:31.539Z [DEBUG] http: request complete: method=GET path=/v1/agent/health?type=client duration="638.198µs"
Dec 09 08:26:31 vps-9e8a4a7f nomad[2720]: http: request complete: method=GET path=/v1/agent/health?type=client duration="638.198µs"
Dec 09 08:26:33 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:33.376Z [DEBUG] http: request complete: method=GET path=/v1/metrics?format=prometheus duration=3.908567ms
Dec 09 08:26:33 vps-9e8a4a7f nomad[2720]: http: request complete: method=GET path=/v1/metrics?format=prometheus duration=3.908567ms
Dec 09 08:26:41 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:41.542Z [DEBUG] http: request complete: method=GET path=/v1/agent/health?type=client duration="663.875µs"
Dec 09 08:26:41 vps-9e8a4a7f nomad[2720]: http: request complete: method=GET path=/v1/agent/health?type=client duration="663.875µs"
Dec 09 08:26:51 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:26:51.547Z [DEBUG] http: request complete: method=GET path=/v1/agent/health?type=client duration=2.317271ms
Dec 09 08:26:51 vps-9e8a4a7f nomad[2720]: http: request complete: method=GET path=/v1/agent/health?type=client duration=2.317271ms
Dec 09 08:27:01 vps-9e8a4a7f nomad[2720]:     2023-12-09T08:27:01.551Z [DEBUG] http: request complete: method=GET path=/v1/agent/health?type=client duration="998.587µs"
Dec 09 08:27:01 vps-9e8a4a7f nomad[2720]: http: request complete: method=GET path=/v1/agent/health?type=client duration="998.587µs"
quoing commented 10 months ago

Same issue on KVM VM + Docker, Nomad 1.7.1

as workaround you can override it.. my 2*3300MHz (eg from cat /proc/cpuinfo) = 6600Mhz

client { cpu_total_compute=6600 .. }

restart nomad

tgross commented 10 months ago

Potentially related: https://github.com/hashicorp/nomad/issues/19412

lindleydev commented 10 months ago

Somewhat related - after upgrading my raspberry pi cluster to use Nomad 1.7.1 I was seeing errors from the CPU fingerprinter.

Dec 10 05:25:23 rasp-pi-2 nomad[1916183]:     2023-12-10T05:25:23.004Z [ERROR] client.alloc_runner: postrun failed: alloc_id=965cd7d7-f029-36d2-1a83-9e1e3db848f9 error="hook \"cpuparts_hook\" failed: open /sys/fs/cgroup/cpuset/nomad/share/cpuset.cpus: no such file or directory"
Dec 10 05:25:23 rasp-pi-2 nomad[1916183]:     2023-12-10T05:25:23.006Z [ERROR] client.alloc_runner: postrun failed: alloc_id=26a42c2f-d788-11e5-9ecb-d8aead7ca081 error="hook \"cpuparts_hook\" failed: open /sys/fs/cgroup/cpuset/nomad/share/cpuset.cpus: no such file or directory"

I was able to fix this by creating the directory Nomad is looking for and it resolved the issue.

It also happened in the pre_run hook as well:

Dec 10 05:26:25 rasp-pi-2 nomad[1916183]:     2023-12-10T05:26:25.228Z [ERROR] client.alloc_runner: prerun failed: alloc_id=e3c914a4-855e-0887-8085-c659ed9cd122 error="pre-run hook \"cpuparts_hook\" failed: open /sys/fs/cgroup/cpuset/nomad/share/cpuset.cpus: no such file or directory"
shoenig commented 10 months ago

@lindleydev that's actually a separate problem - I suspect in your case cgroups is mounted but the cpuset controller is not enabled. In previous versions of Nomad we allowed such a configuration at the expense of not actually enforcing resource utilization, but in 1.7 it's mandatory. There's some discussion about this happening in https://github.com/hashicorp/nomad/pull/19176

Settler commented 10 months ago

Possible reason of this issue described here: https://github.com/hashicorp/nomad/issues/19412#issuecomment-1850509695

tgross commented 10 months ago

Hey folks, just an update that the team is actively working on this issue. This issue and https://github.com/hashicorp/nomad/issues/19412 are effectively duplicates, so I'm going to close this issue as a dupe because there's been a bit more discussion over there.