Open googol opened 10 months ago
Hi @googol can you describe the Cgroups Mount output more? At a passive glance it seems like there are two mounts over /sys/fs/cgroup
What sort of info would be useful? Any useful commands to post the output of? It does look odd to me as well, but I don't know why it's set up that way, that's how it comes in the OS. Nomad 1.6.4 does manage to work with it though
The snippet in the issue body is the relevant lines from mount -l
The same error occurs with 1.7.3:
``` Starting nomad ==> Config enable_syslog is `true` with log_level=INFO ==> Loaded configuration from /boot/config/plugins/nomad/config.d/client.hcl, /boot/config/plugins/nomad/config.d/mounts.hcl, /boot/config/plugins/nomad/config.d/vault.hcl ==> Starting Nomad agent... ==> Nomad agent configuration: Advertise Addrs: HTTP: 10.42.0.70:4646 Bind Addrs: HTTP: [0.0.0.0:4646] Client: true Log Level: INFO Region: global (DC: homelab) Server: false Version: 1.7.3 ==> Nomad agent started! Log data will stream in below: 2024-01-17T22:58:12.001+0200 [WARN] agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/mnt/user/appdata/nomad/plugins 2024-01-17T22:58:12.002+0200 [INFO] agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0 2024-01-17T22:58:12.002+0200 [INFO] agent: detected plugin: name=exec type=driver plugin_version=0.1.0 2024-01-17T22:58:12.002+0200 [INFO] agent: detected plugin: name=qemu type=driver plugin_version=0.1.0 2024-01-17T22:58:12.003+0200 [INFO] agent: detected plugin: name=java type=driver plugin_version=0.1.0 2024-01-17T22:58:12.003+0200 [INFO] agent: detected plugin: name=docker type=driver plugin_version=0.1.0 2024-01-17T22:58:12.003+0200 [INFO] client: using state directory: state_dir=/mnt/user/appdata/nomad/client 2024-01-17T22:58:12.004+0200 [INFO] client: using alloc directory: alloc_dir=/mnt/user/appdata/nomad/alloc 2024-01-17T22:58:12.004+0200 [INFO] client: using dynamic ports: min=20000 max=32000 reserved="" 2024-01-17T22:58:12.025+0200 [WARN] client.fingerprint_mgr.landlock: failed to fingerprint kernel landlock feature: error="function not implemented" 2024-01-17T22:58:12.049+0200 [WARN] client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=lo 2024-01-17T22:58:12.054+0200 [WARN] client.fingerprint_mgr.network: error calling ethtool: error="exit status 75" path=/usr/sbin/ethtool device=tunl0 2024-01-17T22:58:12.056+0200 [WARN] client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=eth1 2024-01-17T22:58:12.060+0200 [WARN] client.fingerprint_mgr.network: error calling ethtool: error="exit status 75" path=/usr/sbin/ethtool device=wg0 2024-01-17T22:58:12.063+0200 [WARN] client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=docker0 2024-01-17T22:58:12.066+0200 [WARN] client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=virbr0 2024-01-17T22:58:12.069+0200 [WARN] client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=nomad 2024-01-17T22:58:12.133+0200 [INFO] client.fingerprint_mgr.vault: Vault is available: cluster=default 2024-01-17T22:58:22.137+0200 [INFO] client.proclib.cg1: initializing nomad cgroups: cores=0-7 2024-01-17T22:58:22.138+0200 [ERROR] client.proclib.cg1: failed to set clone_children on nomad cpuset cgroup: error="open /sys/fs/cgroup/cpuset/nomad/cgroup.clone_children: permission denied" 2024-01-17T22:58:22.138+0200 [INFO] client.plugin: starting plugin manager: plugin-type=csi 2024-01-17T22:58:22.138+0200 [INFO] client.plugin: starting plugin manager: plugin-type=driver 2024-01-17T22:58:22.138+0200 [INFO] client.plugin: starting plugin manager: plugin-type=device 2024-01-17T22:58:22.269+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=089461bb-be0d-0aeb-9de2-00a54934a3a0 task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.286+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=1b62c2a3-97a4-fa9f-1cb6-c3d3c04696a3 task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.298+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=2c1b50d8-bda6-795e-ea9d-6b14c5916b82 task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.308+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=379c17f3-807a-bc84-699c-332f9075aa2f task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.319+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=465177e8-f46b-0f9a-fe46-d902a2cb6ddb task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.320+0200 [INFO] client: node registration complete 2024-01-17T22:58:22.330+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=48fa55d3-652d-2cb6-120d-6a8e6b794b73 task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.341+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=607283fc-0869-8b62-58ec-7c09275bd64e task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.357+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=6598f106-2954-d37f-e0b3-3fd9d43181e8 task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.369+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=6c73cff1-87f9-fb92-934e-229b8e07103b task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.379+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=702e1e6a-cb61-5cc2-6f5e-c69638d2105e task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.391+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=7125d727-022b-721e-b71b-fa8bc4341537 task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.402+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=78298bf8-c293-72fb-86cb-90107f883b73 task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.412+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=809d5daf-1f40-e7b3-f5b9-bfc65688bcc5 task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.423+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=8217ec1f-905e-601f-4fff-f292314cec73 task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.433+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=946758e4-25c7-c106-5cae-468309319b3b task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.445+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=9e401a0e-0720-7143-f2da-520c14f8f025 task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.456+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=ddecd2fe-4204-70c8-e0ac-eafe40a10e0e task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.467+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=f81530e2-3536-10ff-ac1a-50f258100e20 task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.477+0200 [INFO] client: started client: node_id=d0dd4ee1-9a82-c786-fd35-3e688ac846f1 2024-01-17T22:58:22.478+0200 [ERROR] client.alloc_runner: postrun failed: alloc_id=946758e4-25c7-c106-5cae-468309319b3b error="hook \"cpuparts_hook\" failed: open /sys/fs/cgroup/cpuset/nomad/share/cpuset.cpus: permission denied" 2024-01-17T22:58:22.478+0200 [INFO] client.gc: marking allocation for GC: alloc_id=946758e4-25c7-c106-5cae-468309319b3b 2024-01-17T22:58:22.479+0200 [ERROR] client.alloc_runner: postrun failed: alloc_id=ddecd2fe-4204-70c8-e0ac-eafe40a10e0e error="hook \"cpuparts_hook\" failed: open /sys/fs/cgroup/cpuset/nomad/share/cpuset.cpus: permission denied" 2024-01-17T22:58:22.479+0200 [ERROR] client.alloc_runner: postrun failed: alloc_id=6598f106-2954-d37f-e0b3-3fd9d43181e8 error="hook \"cpuparts_hook\" failed: open /sys/fs/cgroup/cpuset/nomad/share/cpuset.cpus: permission denied" 2024-01-17T22:58:22.479+0200 [INFO] client.gc: marking allocation for GC: alloc_id=78298bf8-c293-72fb-86cb-90107f883b73 ```
Yeah sorry @googol I don't think I'll be able to debug this until I have a chance to load up slackware in a VM and poke around. It's unclear what the cgroup mount configuration is and Nomad 1.7 makes some assumptions in how that should look.
Ok, that piece of code clearly explains why I'm seeing this problem, and raises some new questions:
/sys/fs/cgroups
/sys/fs/cgroups
mounts and if any of them are cgroup2
choose v2This looks like an unraid specific thing, from what I've looked up now. The init script /etc/rc.d/rc.S
on my live system has this snippet for configuring cgroups:
# Mount Control Groups filesystem interface:
if grep -wq cgroup /proc/filesystems ; then
# Check if unraidcgroup1 is passed over in command line
if grep -wq unraidcgroup1 /proc/cmdline ; then
if [ -d /sys/fs/cgroup ]; then
# See linux-*/Documentation/cgroups/cgroups.txt (section 1.6)
# Check if we have some tools to autodetect the available cgroup controllers
if [ -x /bin/cut -a -x /bin/tail ]; then
# Mount a tmpfs as the cgroup filesystem root
mount -t tmpfs -o mode=0755,size=8M cgroup_root /sys/fs/cgroup
# Autodetect available controllers and mount them in subfolders
controllers="$(/bin/cut -f 1 /proc/cgroups | /bin/tail -n +2)"
for i in $controllers; do
mkdir /sys/fs/cgroup/$i
mount -t cgroup -o $i $i /sys/fs/cgroup/$i
done
unset i controllers
# Eric S. figured out this needs to go here...
echo 1 > /sys/fs/cgroup/memory/memory.use_hierarchy
else
# We can't use autodetection so fall back mounting them all together
mount -t cgroup cgroup /sys/fs/cgroup
fi
else
mkdir -p /dev/cgroup
mount -t cgroup cgroup /dev/cgroup
fi
else
if [ -d /sys/fs/cgroup ]; then
# See https://docs.kernel.org/admin-guide/cgroup-v2.html (section Mounting)
# Mount a tmpfs as the cgroup2 filesystem root
mount -t tmpfs -o mode=0755,size=8M cgroup_root /sys/fs/cgroup
mount -t cgroup2 none /sys/fs/cgroup
else
mkdir -p /dev/cgroup
mount -t cgroup2 none /dev/cgroup
fi
fi
fi
The upstream slackware64-15 sources seem to have a slightly simpler script though:
# Mount Control Groups filesystem interface:
if [ -z "$container" ]; then
if grep -wq cgroup /proc/filesystems ; then
if [ -d /sys/fs/cgroup ]; then
# See linux-*/Documentation/cgroups/cgroups.txt (section 1.6)
# Check if we have some tools to autodetect the available cgroup controllers
if [ -x /bin/cut -a -x /bin/tail ]; then
# Mount a tmpfs as the cgroup filesystem root
mount -t tmpfs -o mode=0755,size=8M cgroup_root /sys/fs/cgroup
# Autodetect available controllers and mount them in subfolders
controllers="$(/bin/cut -f 1 /proc/cgroups | /bin/tail -n +2)"
for i in $controllers; do
mkdir /sys/fs/cgroup/$i
mount -t cgroup -o $i $i /sys/fs/cgroup/$i
done
unset i controllers
else
# We can't use autodetection so fall back mounting them all together
mount -t cgroup cgroup /sys/fs/cgroup
fi
else
mkdir -p /dev/cgroup
mount -t cgroup cgroup /dev/cgroup
fi
fi
fi
I'll raise an issue with unraid to verify
Update:
I think the cgroup detection logic should be changed from the current model to something a bit more robust. Since it is valid to mount cgroups v2 on top of a tmpfs, checking the first listed mount on the /sys/fs/cgroup
path just is not enough.
Doing some issue board cleanup and noticed this got left in a bit of a limbo. I've re-titled it to reflect the current state and marked it for roadmapping.
Thanks Tim! Looks like I've forgotten to report back like I said I would in my last comment, but unraid released their changes which fixed my problem as expected. So yeah my original immediate problem is solved (running nomad client on unraid), but of course this could come up in other systems for you.
Thanks for the help on this, @shoenig pointing to the relevant bit of the code helped me get this fixed on unraid's side!
Nomad version
The affected client version:
This is also the version on the server.
Version details for v1.6.4 being used as comparison in logs below
``` Nomad v1.6.4 BuildDate 2023-12-07T08:27:54Z Revision dbd5f36a24a924e2ba4dd6195af6a45c922ac8c6 ```
Operating system and Environment details
Unraid Version 6.12.6 2023-12-01 (based on slackware-64 version 15). Kernel 6.1.64.
Using prebuilt nomad binary downloaded from hashicorp, with custom packaging & startup scripts required by unraid
Issue
All allocations fail with the following error messages:
Reproduction steps
Start nomad v1.7.2 client, try to run a job on it.
Expected Result
Job runs as normal
Actual Result
No jobs can get allocated
Nomad Client logs (if appropriate)
logs from startup of nomad client v1.7.2:
Nomad 1.6.4 log on the same machine
``` Starting nomad ==> Config enable_syslog is `true` with log_level=INFO ==> Loaded configuration from /boot/config/plugins/nomad/config.d/client.hcl, /boot/config/plugins/nomad/config.d/mounts.hcl, /boot/config/plugins/nomad/config.d/vault.hcl ==> Starting Nomad agent... ==> Nomad agent configuration: Advertise Addrs: HTTP: 10.42.0.70:4646 Bind Addrs: HTTP: [0.0.0.0:4646] Client: true Log Level: INFO Region: global (DC: homelab) Server: false Version: 1.6.4 ==> Nomad agent started! Log data will stream in below: 2024-01-14T21:51:58.363+0200 [WARN] agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/mnt/user/appdata/nomad/plugins 2024-01-14T21:51:58.364+0200 [INFO] agent: detected plugin: name=exec type=driver plugin_version=0.1.0 2024-01-14T21:51:58.364+0200 [INFO] agent: detected plugin: name=qemu type=driver plugin_version=0.1.0 2024-01-14T21:51:58.364+0200 [INFO] agent: detected plugin: name=java type=driver plugin_version=0.1.0 2024-01-14T21:51:58.364+0200 [INFO] agent: detected plugin: name=docker type=driver plugin_version=0.1.0 2024-01-14T21:51:58.364+0200 [INFO] agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0 2024-01-14T21:51:58.365+0200 [INFO] client: using state directory: state_dir=/mnt/user/appdata/nomad/client 2024-01-14T21:51:58.366+0200 [INFO] client: using alloc directory: alloc_dir=/mnt/user/appdata/nomad/alloc 2024-01-14T21:51:58.366+0200 [INFO] client: using dynamic ports: min=20000 max=32000 reserved="" 2024-01-14T21:51:58.387+0200 [INFO] client.fingerprint_mgr.cgroup: cgroups are available 2024-01-14T21:51:58.389+0200 [WARN] client.fingerprint_mgr.landlock: failed to fingerprint kernel landlock feature: error="function not implemented" 2024-01-14T21:51:58.394+0200 [WARN] client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=lo 2024-01-14T21:51:58.399+0200 [WARN] client.fingerprint_mgr.network: error calling ethtool: error="exit status 75" path=/usr/sbin/ethtool device=tunl0 2024-01-14T21:51:58.401+0200 [WARN] client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=eth1 2024-01-14T21:51:58.406+0200 [WARN] client.fingerprint_mgr.network: error calling ethtool: error="exit status 75" path=/usr/sbin/ethtool device=wg0 2024-01-14T21:51:58.409+0200 [WARN] client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=docker0 2024-01-14T21:51:58.412+0200 [WARN] client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=virbr0 2024-01-14T21:51:58.493+0200 [INFO] client.fingerprint_mgr.vault: Vault is available 2024-01-14T21:52:08.497+0200 [INFO] client.plugin: starting plugin manager: plugin-type=csi 2024-01-14T21:52:08.497+0200 [INFO] client.plugin: starting plugin manager: plugin-type=driver 2024-01-14T21:52:08.497+0200 [INFO] client.plugin: starting plugin manager: plugin-type=device ```
Node status
Interesting thing here is that nomad v1.7.2 reports cgroups v1 even though the system has cgroups v2 (and nomad 1.6.4 reports it correctly)
Nomad v1.6.4 on the same machine
``` ID = d0dd4ee1-9a82-c786-fd35-3e688ac846f1 Name = drogon Node Pool = default Class =
DC = homelab
Drain = false
Eligibility = eligible
Status = ready
CSI Controllers =
CSI Drivers =
Uptime = 4h47m5s
Host Volumes
Name ReadOnly Source
# Removed
Drivers
Driver Detected Healthy Message Time
docker true true Healthy 2024-01-14T21:52:08+02:00
exec true true Healthy 2024-01-14T21:52:08+02:00
java false false 2024-01-14T21:52:08+02:00
qemu true true Healthy 2024-01-14T21:52:08+02:00
raw_exec false false disabled 2024-01-14T21:52:08+02:00
Node Events
Time Subsystem Message Details
2024-01-14T21:43:27+02:00 Cluster Node reregistered by heartbeat
2024-01-14T21:41:41+02:00 Cluster Node heartbeat missed
2024-01-14T21:40:38+02:00 Cluster Node reregistered by heartbeat
2024-01-14T21:40:14+02:00 Cluster Node heartbeat missed
2024-01-14T21:39:46+02:00 Cluster Node reregistered by heartbeat
2024-01-14T21:39:44+02:00 Cluster Node heartbeat missed
2024-01-14T21:27:35+02:00 Drain Node drain complete
2024-01-14T21:26:51+02:00 Drain Node drain strategy set
2024-01-14T17:18:37+02:00 Cluster Node reregistered by heartbeat
2024-01-14T17:15:59+02:00 Cluster Node heartbeat missed
Allocated Resources
CPU Memory Disk
0/33600 MHz 0 B/16 GiB 0 B/224 GiB
Allocation Resource Utilization
CPU Memory
0/33600 MHz 0 B/16 GiB
Host Resource Utilization
CPU Memory Disk
472/33600 MHz 606 MiB/16 GiB (shfs)
Allocations
No allocations placed
Attributes
cpu.arch = amd64
cpu.frequency = 4200
cpu.modelname = Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
cpu.numcores = 8
cpu.reservablecores = 8
cpu.totalcompute = 33600
driver.docker = 1
driver.docker.bridge_ip = 172.17.0.1
driver.docker.os_type = linux
driver.docker.runtimes = io.containerd.runc.v2,io.containerd.runtime.v1.linux,runc
driver.docker.version = 20.10.24
driver.exec = 1
driver.qemu = 1
driver.qemu.version = 7.2.0
kernel.arch = x86_64
kernel.name = linux
kernel.version = 6.1.64-Unraid
memory.totalbytes = 16647389184
nomad.advertise.address = 10.42.0.70:4646
nomad.bridge.hairpin_mode = false
nomad.revision = dbd5f36a24a924e2ba4dd6195af6a45c922ac8c6
nomad.service_discovery = true
nomad.version = 1.6.4
os.name = slackware
os.signals = SIGPIPE,SIGPROF,SIGSYS,SIGWINCH,SIGXFSZ,SIGFPE,SIGIOT,SIGUSR2,SIGCONT,SIGSEGV,SIGNULL,SIGTSTP,SIGTTOU,SIGXCPU,SIGQUIT,SIGTERM,SIGTTIN,SIGBUS,SIGKILL,SIGSTOP,SIGTRAP,SIGUSR1,SIGABRT,SIGINT,SIGIO,SIGHUP,SIGILL,SIGALRM
os.version = 15.0+
plugins.cni.version.bandwidth = v1.4.0
plugins.cni.version.bridge = v1.4.0
plugins.cni.version.dhcp = v1.4.0
plugins.cni.version.dummy = v1.4.0
plugins.cni.version.firewall = v1.4.0
plugins.cni.version.host-device = v1.4.0
plugins.cni.version.host-local = v1.4.0
plugins.cni.version.ipvlan = v1.4.0
plugins.cni.version.loopback = v1.4.0
plugins.cni.version.macvlan = v1.4.0
plugins.cni.version.portmap = v1.4.0
plugins.cni.version.ptp = v1.4.0
plugins.cni.version.sbr = v1.4.0
plugins.cni.version.static = v1.4.0
plugins.cni.version.tap = v1.4.0
plugins.cni.version.tuning = v1.4.0
plugins.cni.version.vlan = v1.4.0
plugins.cni.version.vrf = v1.4.0
unique.cgroup.mountpoint = /sys/fs/cgroup
unique.cgroup.version = v2
unique.hostname = drogon
unique.network.ip-address = 10.42.0.70
unique.storage.bytesfree = 240949563392
unique.storage.bytestotal = 256060481536
unique.storage.volume = shfs
vault.accessible = true
vault.cluster_id = 68f34609-8077-1a60-7578-13f59359f3ca
vault.cluster_name = vault-cluster-5a052fb2
vault.version = 1.15.4
Meta
connect.gateway_image = docker.io/envoyproxy/envoy:v${NOMAD_envoy_version}
connect.log_level = info
connect.proxy_concurrency = 1
connect.sidecar_image = docker.io/envoyproxy/envoy:v${NOMAD_envoy_version}
```
Cgroups mount
Cgroup controllers