cgroup mount detection is not robust to unusual cgroup configurations

googol commented 10 months ago

Nomad version

The affected client version:

Nomad v1.7.2
BuildDate 2023-12-13T19:59:42Z
Revision 64e3dca9274b493e38a49fda3a70fd31d0485b91

This is also the version on the server.

Version details for v1.6.4 being used as comparison in logs below

``` Nomad v1.6.4 BuildDate 2023-12-07T08:27:54Z Revision dbd5f36a24a924e2ba4dd6195af6a45c922ac8c6 ```

Operating system and Environment details

Unraid Version 6.12.6 2023-12-01 (based on slackware-64 version 15). Kernel 6.1.64.

Using prebuilt nomad binary downloaded from hashicorp, with custom packaging & startup scripts required by unraid

Issue

All allocations fail with the following error messages:

    2024-01-14T21:32:37.463+0200 [ERROR] client.alloc_runner: prerun failed: alloc_id=ed6e46b6-5c6b-3448-b486-4e053b4ac9de error="pre-run hook \"cpuparts_hook\" failed: open /sys/fs/cgroup/cpuset/nomad/share/cpuset.cpus: permission denied"
    2024-01-14T21:32:37.463+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=ed6e46b6-5c6b-3448-b486-4e053b4ac9de task=lgtv2mqtt type="Setup Failure" msg="failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: open /sys/fs/cgroup/cpuset/nomad/share/cpuset.cpus: permission denied" failed=true

Reproduction steps

Start nomad v1.7.2 client, try to run a job on it.

Expected Result

Job runs as normal

Actual Result

No jobs can get allocated

Nomad Client logs (if appropriate)

logs from startup of nomad client v1.7.2:

Starting nomad
==> Config enable_syslog is `true` with log_level=INFO
==> Loaded configuration from /boot/config/plugins/nomad/config.d/client.hcl, /boot/config/plugins/nomad/config.d/mounts.hcl, /boot/config/plugins/nomad/config.d/vault.hcl
==> Starting Nomad agent...
==> Nomad agent configuration:

       Advertise Addrs: HTTP: 10.42.0.70:4646
            Bind Addrs: HTTP: [0.0.0.0:4646]
                Client: true
             Log Level: INFO
                Region: global (DC: homelab)
                Server: false
               Version: 1.7.2

==> Nomad agent started! Log data will stream in below:

    2024-01-14T21:40:27.646+0200 [WARN]  agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/mnt/user/appdata/nomad/plugins
    2024-01-14T21:40:27.647+0200 [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
    2024-01-14T21:40:27.647+0200 [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
    2024-01-14T21:40:27.647+0200 [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
    2024-01-14T21:40:27.647+0200 [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
    2024-01-14T21:40:27.647+0200 [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
    2024-01-14T21:40:27.648+0200 [INFO]  client: using state directory: state_dir=/mnt/user/appdata/nomad/client
    2024-01-14T21:40:27.649+0200 [INFO]  client: using alloc directory: alloc_dir=/mnt/user/appdata/nomad/alloc
    2024-01-14T21:40:27.649+0200 [INFO]  client: using dynamic ports: min=20000 max=32000 reserved=""
    2024-01-14T21:40:27.669+0200 [WARN]  client.fingerprint_mgr.landlock: failed to fingerprint kernel landlock feature: error="function not implemented"
    2024-01-14T21:40:27.674+0200 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=lo
    2024-01-14T21:40:27.679+0200 [WARN]  client.fingerprint_mgr.network: error calling ethtool: error="exit status 75" path=/usr/sbin/ethtool device=tunl0
    2024-01-14T21:40:27.681+0200 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=eth1
    2024-01-14T21:40:27.686+0200 [WARN]  client.fingerprint_mgr.network: error calling ethtool: error="exit status 75" path=/usr/sbin/ethtool device=wg0
    2024-01-14T21:40:27.689+0200 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=docker0
    2024-01-14T21:40:27.692+0200 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=virbr0
    2024-01-14T21:40:27.788+0200 [INFO]  client.fingerprint_mgr.vault: Vault is available: cluster=default
    2024-01-14T21:40:37.792+0200 [INFO]  client.proclib.cg1: initializing nomad cgroups: cores=0-7
    2024-01-14T21:40:37.792+0200 [ERROR] client.proclib.cg1: failed to set clone_children on nomad cpuset cgroup: error="open /sys/fs/cgroup/cpuset/nomad/cgroup.clone_children: permission denied"
    2024-01-14T21:40:37.792+0200 [INFO]  client.plugin: starting plugin manager: plugin-type=csi
    2024-01-14T21:40:37.792+0200 [INFO]  client.plugin: starting plugin manager: plugin-type=driver
    2024-01-14T21:40:37.792+0200 [INFO]  client.plugin: starting plugin manager: plugin-type=device

Nomad 1.6.4 log on the same machine

``` Starting nomad ==> Config enable_syslog is `true` with log_level=INFO ==> Loaded configuration from /boot/config/plugins/nomad/config.d/client.hcl, /boot/config/plugins/nomad/config.d/mounts.hcl, /boot/config/plugins/nomad/config.d/vault.hcl ==> Starting Nomad agent... ==> Nomad agent configuration: Advertise Addrs: HTTP: 10.42.0.70:4646 Bind Addrs: HTTP: [0.0.0.0:4646] Client: true Log Level: INFO Region: global (DC: homelab) Server: false Version: 1.6.4 ==> Nomad agent started! Log data will stream in below: 2024-01-14T21:51:58.363+0200 [WARN] agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/mnt/user/appdata/nomad/plugins 2024-01-14T21:51:58.364+0200 [INFO] agent: detected plugin: name=exec type=driver plugin_version=0.1.0 2024-01-14T21:51:58.364+0200 [INFO] agent: detected plugin: name=qemu type=driver plugin_version=0.1.0 2024-01-14T21:51:58.364+0200 [INFO] agent: detected plugin: name=java type=driver plugin_version=0.1.0 2024-01-14T21:51:58.364+0200 [INFO] agent: detected plugin: name=docker type=driver plugin_version=0.1.0 2024-01-14T21:51:58.364+0200 [INFO] agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0 2024-01-14T21:51:58.365+0200 [INFO] client: using state directory: state_dir=/mnt/user/appdata/nomad/client 2024-01-14T21:51:58.366+0200 [INFO] client: using alloc directory: alloc_dir=/mnt/user/appdata/nomad/alloc 2024-01-14T21:51:58.366+0200 [INFO] client: using dynamic ports: min=20000 max=32000 reserved="" 2024-01-14T21:51:58.387+0200 [INFO] client.fingerprint_mgr.cgroup: cgroups are available 2024-01-14T21:51:58.389+0200 [WARN] client.fingerprint_mgr.landlock: failed to fingerprint kernel landlock feature: error="function not implemented" 2024-01-14T21:51:58.394+0200 [WARN] client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=lo 2024-01-14T21:51:58.399+0200 [WARN] client.fingerprint_mgr.network: error calling ethtool: error="exit status 75" path=/usr/sbin/ethtool device=tunl0 2024-01-14T21:51:58.401+0200 [WARN] client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=eth1 2024-01-14T21:51:58.406+0200 [WARN] client.fingerprint_mgr.network: error calling ethtool: error="exit status 75" path=/usr/sbin/ethtool device=wg0 2024-01-14T21:51:58.409+0200 [WARN] client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=docker0 2024-01-14T21:51:58.412+0200 [WARN] client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=virbr0 2024-01-14T21:51:58.493+0200 [INFO] client.fingerprint_mgr.vault: Vault is available 2024-01-14T21:52:08.497+0200 [INFO] client.plugin: starting plugin manager: plugin-type=csi 2024-01-14T21:52:08.497+0200 [INFO] client.plugin: starting plugin manager: plugin-type=driver 2024-01-14T21:52:08.497+0200 [INFO] client.plugin: starting plugin manager: plugin-type=device ```

Node status

Interesting thing here is that nomad v1.7.2 reports cgroups v1 even though the system has cgroups v2 (and nomad 1.6.4 reports it correctly)

# nomad node status -self -verbose
ID              = d0dd4ee1-9a82-c786-fd35-3e688ac846f1
Name            = drogon
Node Pool       = default
Class           = <none>
DC              = homelab
Drain           = false
Eligibility     = eligible
Status          = ready
CSI Controllers = <none>
CSI Drivers     = <none>
Uptime          = 4h40m32s

Host Volumes
Name                     ReadOnly  Source
# Removed

Drivers
Driver    Detected  Healthy  Message   Time
docker    true      true     Healthy   2024-01-14T21:43:26+02:00
exec      true      true     Healthy   2024-01-14T21:43:26+02:00
java      false     false    <none>    2024-01-14T21:43:26+02:00
qemu      true      true     Healthy   2024-01-14T21:43:26+02:00
raw_exec  false     false    disabled  2024-01-14T21:43:26+02:00

Node Events
Time                       Subsystem  Message                         Details
2024-01-14T21:43:27+02:00  Cluster    Node reregistered by heartbeat  <none>
2024-01-14T21:41:41+02:00  Cluster    Node heartbeat missed           <none>
2024-01-14T21:40:38+02:00  Cluster    Node reregistered by heartbeat  <none>
2024-01-14T21:40:14+02:00  Cluster    Node heartbeat missed           <none>
2024-01-14T21:39:46+02:00  Cluster    Node reregistered by heartbeat  <none>
2024-01-14T21:39:44+02:00  Cluster    Node heartbeat missed           <none>
2024-01-14T21:27:35+02:00  Drain      Node drain complete             <none>
2024-01-14T21:26:51+02:00  Drain      Node drain strategy set         <none>
2024-01-14T17:18:37+02:00  Cluster    Node reregistered by heartbeat  <none>
2024-01-14T17:15:59+02:00  Cluster    Node heartbeat missed           <none>

Allocated Resources
CPU      Memory   Disk
0/0 MHz  0 B/0 B  0 B/0 B

Allocation Resource Utilization
CPU      Memory
0/0 MHz  0 B/0 B

Host Resource Utilization
CPU       Memory          Disk
39/0 MHz  714 MiB/16 GiB  (shfs)

Allocations
No allocations placed

Attributes
cpu.arch                        = amd64
cpu.frequency                   = 4000
cpu.modelname                   = Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
cpu.numcores                    = 8
cpu.reservablecores             = 8
cpu.totalcompute                = 32000
cpu.usablecompute               = 32000
driver.docker                   = 1
driver.docker.bridge_ip         = 172.17.0.1
driver.docker.os_type           = linux
driver.docker.runtimes          = io.containerd.runc.v2,io.containerd.runtime.v1.linux,runc
driver.docker.version           = 20.10.24
driver.exec                     = 1
driver.qemu                     = 1
driver.qemu.version             = 7.2.0
kernel.arch                     = x86_64
kernel.name                     = linux
kernel.version                  = 6.1.64-Unraid
memory.totalbytes               = 16647389184
nomad.advertise.address         = 10.42.0.70:4646
nomad.bridge.hairpin_mode       = false
nomad.revision                  = 64e3dca9274b493e38a49fda3a70fd31d0485b91
nomad.service_discovery         = true
nomad.version                   = 1.7.2
numa.node.count                 = 1
numa.node0.cores                = 0-7
os.cgroups.version              = 1
os.name                         = slackware
os.signals                      = SIGSTOP,SIGHUP,SIGILL,SIGPIPE,SIGQUIT,SIGIO,SIGTTIN,SIGUSR1,SIGXCPU,SIGALRM,SIGINT,SIGSEGV,SIGSYS,SIGABRT,SIGIOT,SIGTERM,SIGXFSZ,SIGNULL,SIGBUS,SIGTRAP,SIGTTOU,SIGTSTP,SIGCONT,SIGFPE,SIGKILL,SIGPROF,SIGUSR2,SIGWINCH
os.version                      = 15.0+
plugins.cni.version.bandwidth   = v1.4.0
plugins.cni.version.bridge      = v1.4.0
plugins.cni.version.dhcp        = v1.4.0
plugins.cni.version.dummy       = v1.4.0
plugins.cni.version.firewall    = v1.4.0
plugins.cni.version.host-device = v1.4.0
plugins.cni.version.host-local  = v1.4.0
plugins.cni.version.ipvlan      = v1.4.0
plugins.cni.version.loopback    = v1.4.0
plugins.cni.version.macvlan     = v1.4.0
plugins.cni.version.portmap     = v1.4.0
plugins.cni.version.ptp         = v1.4.0
plugins.cni.version.sbr         = v1.4.0
plugins.cni.version.static      = v1.4.0
plugins.cni.version.tap         = v1.4.0
plugins.cni.version.tuning      = v1.4.0
plugins.cni.version.vlan        = v1.4.0
plugins.cni.version.vrf         = v1.4.0
unique.hostname                 = drogon
unique.network.ip-address       = 10.42.0.70
unique.storage.bytesfree        = 240949764096
unique.storage.bytestotal       = 256060481536
unique.storage.volume           = shfs
vault.accessible                = true
vault.cluster_id                = 68f34609-8077-1a60-7578-13f59359f3ca
vault.cluster_name              = vault-cluster-5a052fb2
vault.version                   = 1.15.4

Meta
connect.gateway_image     = docker.io/envoyproxy/envoy:v${NOMAD_envoy_version}
connect.log_level         = info
connect.proxy_concurrency = 1
connect.sidecar_image     = docker.io/envoyproxy/envoy:v${NOMAD_envoy_version}

Nomad v1.6.4 on the same machine

``` ID = d0dd4ee1-9a82-c786-fd35-3e688ac846f1 Name = drogon Node Pool = default Class = DC = homelab Drain = false Eligibility = eligible Status = ready CSI Controllers = CSI Drivers = Uptime = 4h47m5s Host Volumes Name ReadOnly Source # Removed Drivers Driver Detected Healthy Message Time docker true true Healthy 2024-01-14T21:52:08+02:00 exec true true Healthy 2024-01-14T21:52:08+02:00 java false false 2024-01-14T21:52:08+02:00 qemu true true Healthy 2024-01-14T21:52:08+02:00 raw_exec false false disabled 2024-01-14T21:52:08+02:00 Node Events Time Subsystem Message Details 2024-01-14T21:43:27+02:00 Cluster Node reregistered by heartbeat 2024-01-14T21:41:41+02:00 Cluster Node heartbeat missed 2024-01-14T21:40:38+02:00 Cluster Node reregistered by heartbeat 2024-01-14T21:40:14+02:00 Cluster Node heartbeat missed 2024-01-14T21:39:46+02:00 Cluster Node reregistered by heartbeat 2024-01-14T21:39:44+02:00 Cluster Node heartbeat missed 2024-01-14T21:27:35+02:00 Drain Node drain complete 2024-01-14T21:26:51+02:00 Drain Node drain strategy set 2024-01-14T17:18:37+02:00 Cluster Node reregistered by heartbeat 2024-01-14T17:15:59+02:00 Cluster Node heartbeat missed Allocated Resources CPU Memory Disk 0/33600 MHz 0 B/16 GiB 0 B/224 GiB Allocation Resource Utilization CPU Memory 0/33600 MHz 0 B/16 GiB Host Resource Utilization CPU Memory Disk 472/33600 MHz 606 MiB/16 GiB (shfs) Allocations No allocations placed Attributes cpu.arch = amd64 cpu.frequency = 4200 cpu.modelname = Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz cpu.numcores = 8 cpu.reservablecores = 8 cpu.totalcompute = 33600 driver.docker = 1 driver.docker.bridge_ip = 172.17.0.1 driver.docker.os_type = linux driver.docker.runtimes = io.containerd.runc.v2,io.containerd.runtime.v1.linux,runc driver.docker.version = 20.10.24 driver.exec = 1 driver.qemu = 1 driver.qemu.version = 7.2.0 kernel.arch = x86_64 kernel.name = linux kernel.version = 6.1.64-Unraid memory.totalbytes = 16647389184 nomad.advertise.address = 10.42.0.70:4646 nomad.bridge.hairpin_mode = false nomad.revision = dbd5f36a24a924e2ba4dd6195af6a45c922ac8c6 nomad.service_discovery = true nomad.version = 1.6.4 os.name = slackware os.signals = SIGPIPE,SIGPROF,SIGSYS,SIGWINCH,SIGXFSZ,SIGFPE,SIGIOT,SIGUSR2,SIGCONT,SIGSEGV,SIGNULL,SIGTSTP,SIGTTOU,SIGXCPU,SIGQUIT,SIGTERM,SIGTTIN,SIGBUS,SIGKILL,SIGSTOP,SIGTRAP,SIGUSR1,SIGABRT,SIGINT,SIGIO,SIGHUP,SIGILL,SIGALRM os.version = 15.0+ plugins.cni.version.bandwidth = v1.4.0 plugins.cni.version.bridge = v1.4.0 plugins.cni.version.dhcp = v1.4.0 plugins.cni.version.dummy = v1.4.0 plugins.cni.version.firewall = v1.4.0 plugins.cni.version.host-device = v1.4.0 plugins.cni.version.host-local = v1.4.0 plugins.cni.version.ipvlan = v1.4.0 plugins.cni.version.loopback = v1.4.0 plugins.cni.version.macvlan = v1.4.0 plugins.cni.version.portmap = v1.4.0 plugins.cni.version.ptp = v1.4.0 plugins.cni.version.sbr = v1.4.0 plugins.cni.version.static = v1.4.0 plugins.cni.version.tap = v1.4.0 plugins.cni.version.tuning = v1.4.0 plugins.cni.version.vlan = v1.4.0 plugins.cni.version.vrf = v1.4.0 unique.cgroup.mountpoint = /sys/fs/cgroup unique.cgroup.version = v2 unique.hostname = drogon unique.network.ip-address = 10.42.0.70 unique.storage.bytesfree = 240949563392 unique.storage.bytestotal = 256060481536 unique.storage.volume = shfs vault.accessible = true vault.cluster_id = 68f34609-8077-1a60-7578-13f59359f3ca vault.cluster_name = vault-cluster-5a052fb2 vault.version = 1.15.4 Meta connect.gateway_image = docker.io/envoyproxy/envoy:v${NOMAD_envoy_version} connect.log_level = info connect.proxy_concurrency = 1 connect.sidecar_image = docker.io/envoyproxy/envoy:v${NOMAD_envoy_version} ```

Cgroups mount

# mount -l | grep cgroup
cgroup_root on /sys/fs/cgroup type tmpfs (rw,relatime,size=8192k,mode=755,inode64)
none on /sys/fs/cgroup type cgroup2 (rw,relatime)

Cgroup controllers

# cat /sys/fs/cgroup/cgroup.controllers 
cpuset cpu io memory hugetlb pids

shoenig commented 10 months ago

Hi @googol can you describe the Cgroups Mount output more? At a passive glance it seems like there are two mounts over /sys/fs/cgroup

googol commented 10 months ago

What sort of info would be useful? Any useful commands to post the output of? It does look odd to me as well, but I don't know why it's set up that way, that's how it comes in the OS. Nomad 1.6.4 does manage to work with it though

The snippet in the issue body is the relevant lines from mount -l

googol commented 10 months ago

The same error occurs with 1.7.3:

Nomad 1.7.3 startup log

``` Starting nomad ==> Config enable_syslog is `true` with log_level=INFO ==> Loaded configuration from /boot/config/plugins/nomad/config.d/client.hcl, /boot/config/plugins/nomad/config.d/mounts.hcl, /boot/config/plugins/nomad/config.d/vault.hcl ==> Starting Nomad agent... ==> Nomad agent configuration: Advertise Addrs: HTTP: 10.42.0.70:4646 Bind Addrs: HTTP: [0.0.0.0:4646] Client: true Log Level: INFO Region: global (DC: homelab) Server: false Version: 1.7.3 ==> Nomad agent started! Log data will stream in below: 2024-01-17T22:58:12.001+0200 [WARN] agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/mnt/user/appdata/nomad/plugins 2024-01-17T22:58:12.002+0200 [INFO] agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0 2024-01-17T22:58:12.002+0200 [INFO] agent: detected plugin: name=exec type=driver plugin_version=0.1.0 2024-01-17T22:58:12.002+0200 [INFO] agent: detected plugin: name=qemu type=driver plugin_version=0.1.0 2024-01-17T22:58:12.003+0200 [INFO] agent: detected plugin: name=java type=driver plugin_version=0.1.0 2024-01-17T22:58:12.003+0200 [INFO] agent: detected plugin: name=docker type=driver plugin_version=0.1.0 2024-01-17T22:58:12.003+0200 [INFO] client: using state directory: state_dir=/mnt/user/appdata/nomad/client 2024-01-17T22:58:12.004+0200 [INFO] client: using alloc directory: alloc_dir=/mnt/user/appdata/nomad/alloc 2024-01-17T22:58:12.004+0200 [INFO] client: using dynamic ports: min=20000 max=32000 reserved="" 2024-01-17T22:58:12.025+0200 [WARN] client.fingerprint_mgr.landlock: failed to fingerprint kernel landlock feature: error="function not implemented" 2024-01-17T22:58:12.049+0200 [WARN] client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=lo 2024-01-17T22:58:12.054+0200 [WARN] client.fingerprint_mgr.network: error calling ethtool: error="exit status 75" path=/usr/sbin/ethtool device=tunl0 2024-01-17T22:58:12.056+0200 [WARN] client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=eth1 2024-01-17T22:58:12.060+0200 [WARN] client.fingerprint_mgr.network: error calling ethtool: error="exit status 75" path=/usr/sbin/ethtool device=wg0 2024-01-17T22:58:12.063+0200 [WARN] client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=docker0 2024-01-17T22:58:12.066+0200 [WARN] client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=virbr0 2024-01-17T22:58:12.069+0200 [WARN] client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=nomad 2024-01-17T22:58:12.133+0200 [INFO] client.fingerprint_mgr.vault: Vault is available: cluster=default 2024-01-17T22:58:22.137+0200 [INFO] client.proclib.cg1: initializing nomad cgroups: cores=0-7 2024-01-17T22:58:22.138+0200 [ERROR] client.proclib.cg1: failed to set clone_children on nomad cpuset cgroup: error="open /sys/fs/cgroup/cpuset/nomad/cgroup.clone_children: permission denied" 2024-01-17T22:58:22.138+0200 [INFO] client.plugin: starting plugin manager: plugin-type=csi 2024-01-17T22:58:22.138+0200 [INFO] client.plugin: starting plugin manager: plugin-type=driver 2024-01-17T22:58:22.138+0200 [INFO] client.plugin: starting plugin manager: plugin-type=device 2024-01-17T22:58:22.269+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=089461bb-be0d-0aeb-9de2-00a54934a3a0 task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.286+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=1b62c2a3-97a4-fa9f-1cb6-c3d3c04696a3 task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.298+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=2c1b50d8-bda6-795e-ea9d-6b14c5916b82 task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.308+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=379c17f3-807a-bc84-699c-332f9075aa2f task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.319+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=465177e8-f46b-0f9a-fe46-d902a2cb6ddb task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.320+0200 [INFO] client: node registration complete 2024-01-17T22:58:22.330+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=48fa55d3-652d-2cb6-120d-6a8e6b794b73 task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.341+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=607283fc-0869-8b62-58ec-7c09275bd64e task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.357+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=6598f106-2954-d37f-e0b3-3fd9d43181e8 task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.369+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=6c73cff1-87f9-fb92-934e-229b8e07103b task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.379+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=702e1e6a-cb61-5cc2-6f5e-c69638d2105e task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.391+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=7125d727-022b-721e-b71b-fa8bc4341537 task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.402+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=78298bf8-c293-72fb-86cb-90107f883b73 task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.412+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=809d5daf-1f40-e7b3-f5b9-bfc65688bcc5 task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.423+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=8217ec1f-905e-601f-4fff-f292314cec73 task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.433+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=946758e4-25c7-c106-5cae-468309319b3b task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.445+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=9e401a0e-0720-7143-f2da-520c14f8f025 task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.456+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=ddecd2fe-4204-70c8-e0ac-eafe40a10e0e task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.467+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=f81530e2-3536-10ff-ac1a-50f258100e20 task=xxx type=Received msg="Task received by client" failed=false 2024-01-17T22:58:22.477+0200 [INFO] client: started client: node_id=d0dd4ee1-9a82-c786-fd35-3e688ac846f1 2024-01-17T22:58:22.478+0200 [ERROR] client.alloc_runner: postrun failed: alloc_id=946758e4-25c7-c106-5cae-468309319b3b error="hook \"cpuparts_hook\" failed: open /sys/fs/cgroup/cpuset/nomad/share/cpuset.cpus: permission denied" 2024-01-17T22:58:22.478+0200 [INFO] client.gc: marking allocation for GC: alloc_id=946758e4-25c7-c106-5cae-468309319b3b 2024-01-17T22:58:22.479+0200 [ERROR] client.alloc_runner: postrun failed: alloc_id=ddecd2fe-4204-70c8-e0ac-eafe40a10e0e error="hook \"cpuparts_hook\" failed: open /sys/fs/cgroup/cpuset/nomad/share/cpuset.cpus: permission denied" 2024-01-17T22:58:22.479+0200 [ERROR] client.alloc_runner: postrun failed: alloc_id=6598f106-2954-d37f-e0b3-3fd9d43181e8 error="hook \"cpuparts_hook\" failed: open /sys/fs/cgroup/cpuset/nomad/share/cpuset.cpus: permission denied" 2024-01-17T22:58:22.479+0200 [INFO] client.gc: marking allocation for GC: alloc_id=78298bf8-c293-72fb-86cb-90107f883b73 ```

shoenig commented 10 months ago

Yeah sorry @googol I don't think I'll be able to debug this until I have a chance to load up slackware in a VM and poke around. It's unclear what the cgroup mount configuration is and Nomad 1.7 makes some assumptions in how that should look.

googol commented 10 months ago

Ok, that piece of code clearly explains why I'm seeing this problem, and raises some new questions:

why are there two mounts for /sys/fs/cgroups
is that a slackware or unraid thing
would there be a better algorithm for the cgroup detection? The current one is clearly wrong, since only cgroups v2 is available on this system. Maybe it should error out completely if it cannot determine the version for sure, or just look at all the /sys/fs/cgroups mounts and if any of them are cgroup2 choose v2

googol commented 10 months ago

This looks like an unraid specific thing, from what I've looked up now. The init script /etc/rc.d/rc.S on my live system has this snippet for configuring cgroups:

# Mount Control Groups filesystem interface:
if grep -wq cgroup /proc/filesystems ; then
  # Check if unraidcgroup1 is passed over in command line
  if grep -wq unraidcgroup1 /proc/cmdline ; then
    if [ -d /sys/fs/cgroup ]; then
      # See linux-*/Documentation/cgroups/cgroups.txt (section 1.6)
      # Check if we have some tools to autodetect the available cgroup controllers
      if [ -x /bin/cut -a -x /bin/tail ]; then
        # Mount a tmpfs as the cgroup filesystem root
        mount -t tmpfs -o mode=0755,size=8M cgroup_root /sys/fs/cgroup
        # Autodetect available controllers and mount them in subfolders
        controllers="$(/bin/cut -f 1 /proc/cgroups | /bin/tail -n +2)"
        for i in $controllers; do
          mkdir /sys/fs/cgroup/$i
          mount -t cgroup -o $i $i /sys/fs/cgroup/$i
        done
        unset i controllers
        # Eric S. figured out this needs to go here...
        echo 1 > /sys/fs/cgroup/memory/memory.use_hierarchy
      else
        # We can't use autodetection so fall back mounting them all together
        mount -t cgroup cgroup /sys/fs/cgroup
      fi
    else
      mkdir -p /dev/cgroup
      mount -t cgroup cgroup /dev/cgroup
    fi
  else
    if [ -d /sys/fs/cgroup ]; then
      # See https://docs.kernel.org/admin-guide/cgroup-v2.html (section Mounting)
      # Mount a tmpfs as the cgroup2 filesystem root
      mount -t tmpfs -o mode=0755,size=8M cgroup_root /sys/fs/cgroup
      mount -t cgroup2 none /sys/fs/cgroup
    else
      mkdir -p /dev/cgroup
      mount -t cgroup2 none /dev/cgroup
    fi
  fi
fi

The upstream slackware64-15 sources seem to have a slightly simpler script though:

# Mount Control Groups filesystem interface:
if [ -z "$container" ]; then
  if grep -wq cgroup /proc/filesystems ; then
    if [ -d /sys/fs/cgroup ]; then
      # See linux-*/Documentation/cgroups/cgroups.txt (section 1.6)
      # Check if we have some tools to autodetect the available cgroup controllers
      if [ -x /bin/cut -a -x /bin/tail ]; then
        # Mount a tmpfs as the cgroup filesystem root
        mount -t tmpfs -o mode=0755,size=8M cgroup_root /sys/fs/cgroup
        # Autodetect available controllers and mount them in subfolders
        controllers="$(/bin/cut -f 1 /proc/cgroups | /bin/tail -n +2)"
        for i in $controllers; do
          mkdir /sys/fs/cgroup/$i
          mount -t cgroup -o $i $i /sys/fs/cgroup/$i
        done
        unset i controllers
      else
        # We can't use autodetection so fall back mounting them all together
        mount -t cgroup cgroup /sys/fs/cgroup
      fi
    else
      mkdir -p /dev/cgroup
      mount -t cgroup cgroup /dev/cgroup
    fi
  fi
fi

I'll raise an issue with unraid to verify

googol commented 9 months ago

Update:

The odd looking mount of cgroups is unraid-specific, base slackware doesn't include cgroups v2 support
Unraid has merged a patch to remove the tmpfs mount, so for me this should be resolved in the next unraid release. I'll let you know when I've been able to test it.

I think the cgroup detection logic should be changed from the current model to something a bit more robust. Since it is valid to mount cgroups v2 on top of a tmpfs, checking the first listed mount on the /sys/fs/cgroup path just is not enough.

tgross commented 2 weeks ago

Doing some issue board cleanup and noticed this got left in a bit of a limbo. I've re-titled it to reflect the current state and marked it for roadmapping.

googol commented 2 weeks ago

Thanks Tim! Looks like I've forgotten to report back like I said I would in my last comment, but unraid released their changes which fixed my problem as expected. So yeah my original immediate problem is solved (running nomad client on unraid), but of course this could come up in other systems for you.

Thanks for the help on this, @shoenig pointing to the relevant bit of the code helped me get this fixed on unraid's side!

hashicorp / nomad