Open rodrigol-chan opened 4 months ago
Hi @rodrigol-chan! Sorry to hear you're running into trouble. The error you're getting here is particularly weird:
write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device
We're writing to the /sys/fs/cgroup
mount, which is a virtual file system! The only way I can think of for this to happen is if we've written a ton of inodes to the cgroup and haven't been cleaning them up correctly. What do you get if you cat the /proc/cgroups
virtual file?
$ cat /proc/cgroups
#subsys_name hierarchy num_cgroups enabled
cpuset 0 230 1
cpu 0 230 1
cpuacct 0 230 1
blkio 0 230 1
memory 0 230 1
devices 0 230 1
freezer 0 230 1
net_cls 0 230 1
perf_event 0 230 1
net_prio 0 230 1
hugetlb 0 230 1
pids 0 230 1
rdma 0 230 1
misc 0 230 1
It happened again just now, on a different machine.
$ cat /proc/cgroups
#subsys_name hierarchy num_cgroups enabled
cpuset 0 3953 1
cpu 0 3953 1
cpuacct 0 3953 1
blkio 0 3953 1
memory 0 3953 1
devices 0 3953 1
freezer 0 3953 1
net_cls 0 3953 1
perf_event 0 3953 1
net_prio 0 3953 1
hugetlb 0 3953 1
pids 0 3953 1
rdma 0 3953 1
misc 0 3953 1
This Nomad client configuration now looks relevant:
client {
gc_max_allocs = 300
gc_disk_usage_threshold = 80
}
And currently have over 300 allocations:
$ sudo ls -1 /var/lib/nomad/alloc | wc -l
374
Nomad seems to be keeping a lot of tmpfss around even if the allocations aren't running anymore. I'm not sure if that's by design.
$ df -t tmpfs | wc -l
407
For extra context: issue seems new with the 1.7.x upgrade. We've run this configuration in 1.6.x for about 8 months with no similar issues.
Thanks for that extra info @rodrigol-chan. Even with that large number of allocs, I'd think you'd be ok until you get to 65535 inodes. I'll dig into that a little further to see if there's some more /sys
or /proc
filesystem spelunking we can do here.
Nomad seems to be keeping a lot of tmpfss around even if the allocations aren't running anymore. I'm not sure if that's by design.
The mounts are left in place until the allocation is GC'd on the client. We do that so that you can debug failed allocations.
The issue still happens as of 1.8.3. Is there anything we can do to help troubleshoot this?
Hi @rodrigol-chan, sorry, I haven't been able to circle back to this and I'm currently swamped trying to land some work for our 1.9 beta next week.
I suspect this is platform-specific. I think you'll want to look into whether there's anything in the host configuration that could be limiting the size of those virtual FS directories.
Hi @rodrigol-chan! Just wanted to check in so you don't think I've forgotten this issue. I re-read through your initial report to see if there were any clues I missed.
Also interesting to observe is that, unlike in our other 1.7.x clients, there's overlap between the CPUs for the reserve and share slices:
Even ignoring the errors you're seeing, that's got to be a bug all by itself. These should never overlap. Even though we can't write to the two files atomically, we always remove from the source first and then write to the destination. So in that tiny race you should see a missing CPU but not one counted twice. So I'll look into seeing if I can find any place where there's potentially another race condition here where that's not correctly handled.
All allocations that failed are from periodic jobs, running with on the exec driver with no core constraints.
You have other allocations on the same host that do use core constraints though? If not, we're writing an empty value to the cgroup. In which case, I found this Stack Exchange post which describes that scenario, but has no answer. :facepalm:
I managed to dig up a few old issues that suggest that if cpuset.mem
doesn't exist in the cgroup directory, then you can't write to cpuset.cpus
either, but I also can't create a scenario where it wouldn't exist. Just creating a new directory with something like mkdir /sys/fs/cgroup/nomad.slice/new.slice
makes it show up for me and you can't remove it.
Also, I wanted to see if I could get this error outside of Nomad by echoing a bad input to the cgroup file, and wasn't able to get that same error.
input | result | error |
---|---|---|
" " |
unset | - |
"" |
unset | - |
-1 |
- | write error: Invalid argument |
2 |
- | write error: Numerical result out of range |
2-1 |
- | write error: Invalid argument |
a |
- | write error: Invalid argument |
,0 |
0 | - |
1, |
1 | - |
0-0 |
0 | - |
0-a |
- | write error: Invalid argument |
I did get some interesting (but different) errors trying to write to the nomad.slice/cpuset.cpus
# cat /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus
1
# cat /sys/fs/cgroup/nomad.slice/cpuset.cpus
0
# cat /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus
1
# echo 0 > /sys/fs/cgroup/nomad.slice/cpuset.cpus
bash: echo: write error: Device or resource busy
One more thing I'd like you to try is the following, to make sure we've counted the cgroups correctly when trying to figure out if its the inodes issue:
# find /sys/fs/cgroup/ | wc -l
2965
# find /sys/fs/cgroup/nomad.slice | wc -l
147
You have other allocations on the same host that do use core constraints though?
That's correct.
One more thing I'd like you to try is the following, to make sure we've counted the cgroups correctly when trying to figure out if its the inodes issue.
Just happened again:
# find /sys/fs/cgroup -depth -type d | wc -l
81
# find /sys/fs/cgroup/nomad.slice -depth -type d | wc -l
29
# head /sys/fs/cgroup/nomad.slice/cpuset.cpus /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus /sys/fs/cgroup/nomad.slice/share.slice/cpuset.cpus
==> /sys/fs/cgroup/nomad.slice/cpuset.cpus <==
0-31
==> /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus <==
0-3
==> /sys/fs/cgroup/nomad.slice/share.slice/cpuset.cpus <==
4-31
Log output:
{"@level":"error","@message":"postrun failed","@module":"client.alloc_runner","@timestamp":"2024-09-30T16:26:17.441673+02:00","alloc_id":"ff98bb16-7a4e-2b4d-f8d6-584d767dd1bf","error":"hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device"}
{"@level":"error","@message":"postrun failed","@module":"client.alloc_runner","@timestamp":"2024-09-30T16:26:18.425610+02:00","alloc_id":"682bf2a1-d3d9-417e-1325-c4e2ffc56185","error":"hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device"}
{"@level":"error","@message":"prerun failed","@module":"client.alloc_runner","@timestamp":"2024-09-30T16:27:00.871024+02:00","alloc_id":"d6108b42-84de-5da3-0fd7-f902107d6069","error":"pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-09-30T16:27:00.871102+02:00","alloc_id":"d6108b42-84de-5da3-0fd7-f902107d6069","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device","task":"timer","type":"Setup Failure"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-09-30T16:27:00.873250+02:00","alloc_id":"d6108b42-84de-5da3-0fd7-f902107d6069","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device","task":"nix-setup-profiles","type":"Setup Failure"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-09-30T16:27:00.875331+02:00","alloc_id":"d6108b42-84de-5da3-0fd7-f902107d6069","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device","task":"promtail","type":"Setup Failure"}
{"@level":"error","@message":"postrun failed","@module":"client.alloc_runner","@timestamp":"2024-09-30T16:27:00.891020+02:00","alloc_id":"d6108b42-84de-5da3-0fd7-f902107d6069","error":"hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device"}
{"@level":"error","@message":"prerun failed","@module":"client.alloc_runner","@timestamp":"2024-09-30T16:28:00.461064+02:00","alloc_id":"c4f3dcf4-dfbd-6a10-8a4a-59e7dd2c7cf2","error":"pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-09-30T16:28:00.461112+02:00","alloc_id":"c4f3dcf4-dfbd-6a10-8a4a-59e7dd2c7cf2","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device","task":"nix-setup-profiles","type":"Setup Failure"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-09-30T16:28:00.463403+02:00","alloc_id":"c4f3dcf4-dfbd-6a10-8a4a-59e7dd2c7cf2","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device","task":"promtail","type":"Setup Failure"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-09-30T16:28:00.465462+02:00","alloc_id":"c4f3dcf4-dfbd-6a10-8a4a-59e7dd2c7cf2","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device","task":"timer","type":"Setup Failure"}
{"@level":"error","@message":"postrun failed","@module":"client.alloc_runner","@timestamp":"2024-09-30T16:28:04.872027+02:00","alloc_id":"c4f3dcf4-dfbd-6a10-8a4a-59e7dd2c7cf2","error":"hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device"}
{"@level":"error","@message":"prerun failed","@module":"client.alloc_runner","@timestamp":"2024-09-30T16:28:51.538329+02:00","alloc_id":"8c0fbbff-6823-c309-e64f-5a107fad5f9b","error":"pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-09-30T16:28:51.538372+02:00","alloc_id":"8c0fbbff-6823-c309-e64f-5a107fad5f9b","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device","task":"hulppiet-processing","type":"Setup Failure"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-09-30T16:28:51.540765+02:00","alloc_id":"8c0fbbff-6823-c309-e64f-5a107fad5f9b","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device","task":"nix-setup-profiles","type":"Setup Failure"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-09-30T16:28:51.543117+02:00","alloc_id":"8c0fbbff-6823-c309-e64f-5a107fad5f9b","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device","task":"promtail","type":"Setup Failure"}
{"@level":"error","@message":"postrun failed","@module":"client.alloc_runner","@timestamp":"2024-09-30T16:28:51.559830+02:00","alloc_id":"8c0fbbff-6823-c309-e64f-5a107fad5f9b","error":"hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device"}
{"@level":"error","@message":"postrun failed","@module":"client.alloc_runner","@timestamp":"2024-09-30T16:29:33.656064+02:00","alloc_id":"aafaa858-254c-b7c1-608d-b475eac076df","error":"hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device"}
{"@level":"error","@message":"prerun failed","@module":"client.alloc_runner","@timestamp":"2024-09-30T16:30:00.186942+02:00","alloc_id":"fcf42431-4866-956f-0fd6-cfb3bb0bc6f7","error":"pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-09-30T16:30:00.186998+02:00","alloc_id":"fcf42431-4866-956f-0fd6-cfb3bb0bc6f7","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device","task":"promtail","type":"Setup Failure"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-09-30T16:30:00.189414+02:00","alloc_id":"fcf42431-4866-956f-0fd6-cfb3bb0bc6f7","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device","task":"timer","type":"Setup Failure"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-09-30T16:30:00.191631+02:00","alloc_id":"fcf42431-4866-956f-0fd6-cfb3bb0bc6f7","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device","task":"nix-setup-profiles","type":"Setup Failure"}
{"@level":"error","@message":"prerun failed","@module":"client.alloc_runner","@timestamp":"2024-09-30T16:30:00.616165+02:00","alloc_id":"4104ffa6-61d6-bd2b-c4c6-16dc67fc6101","error":"pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-09-30T16:30:00.616259+02:00","alloc_id":"4104ffa6-61d6-bd2b-c4c6-16dc67fc6101","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device","task":"sharkmachine-db-maintenance","type":"Setup Failure"}
{"@level":"error","@message":"prerun failed","@module":"client.alloc_runner","@timestamp":"2024-09-30T16:30:00.618142+02:00","alloc_id":"e5fea182-c573-1165-beef-1bb28b54457b","error":"pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-09-30T16:30:00.618185+02:00","alloc_id":"e5fea182-c573-1165-beef-1bb28b54457b","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device","task":"nix-setup-profiles","type":"Setup Failure"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-09-30T16:30:00.622550+02:00","alloc_id":"4104ffa6-61d6-bd2b-c4c6-16dc67fc6101","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device","task":"nix-setup-profiles","type":"Setup Failure"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-09-30T16:30:00.624860+02:00","alloc_id":"e5fea182-c573-1165-beef-1bb28b54457b","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device","task":"requestmachine-timer","type":"Setup Failure"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-09-30T16:30:00.630967+02:00","alloc_id":"4104ffa6-61d6-bd2b-c4c6-16dc67fc6101","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device","task":"promtail","type":"Setup Failure"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-09-30T16:30:00.633081+02:00","alloc_id":"e5fea182-c573-1165-beef-1bb28b54457b","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device","task":"promtail","type":"Setup Failure"}
{"@level":"error","@message":"postrun failed","@module":"client.alloc_runner","@timestamp":"2024-09-30T16:30:00.666757+02:00","alloc_id":"e5fea182-c573-1165-beef-1bb28b54457b","error":"hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device"}
{"@level":"error","@message":"postrun failed","@module":"client.alloc_runner","@timestamp":"2024-09-30T16:30:00.668936+02:00","alloc_id":"4104ffa6-61d6-bd2b-c4c6-16dc67fc6101","error":"hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device"}
{"@level":"error","@message":"prerun failed","@module":"client.alloc_runner","@timestamp":"2024-09-30T16:30:00.978237+02:00","alloc_id":"b4282dcd-db53-0b92-7460-4948414fdc46","error":"pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-09-30T16:30:00.978283+02:00","alloc_id":"b4282dcd-db53-0b92-7460-4948414fdc46","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device","task":"timer","type":"Setup Failure"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-09-30T16:30:00.982879+02:00","alloc_id":"b4282dcd-db53-0b92-7460-4948414fdc46","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device","task":"nix-setup-profiles","type":"Setup Failure"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-09-30T16:30:00.989391+02:00","alloc_id":"b4282dcd-db53-0b92-7460-4948414fdc46","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device","task":"promtail","type":"Setup Failure"}
{"@level":"error","@message":"prerun failed","@module":"client.alloc_runner","@timestamp":"2024-09-30T16:30:01.338338+02:00","alloc_id":"fb073ab4-0928-dd31-af16-9d09356df977","error":"pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-09-30T16:30:01.338406+02:00","alloc_id":"fb073ab4-0928-dd31-af16-9d09356df977","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device","task":"imaginator-maintenance-timer","type":"Setup Failure"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-09-30T16:30:01.342794+02:00","alloc_id":"fb073ab4-0928-dd31-af16-9d09356df977","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device","task":"nix-setup-profiles","type":"Setup Failure"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-09-30T16:30:01.349456+02:00","alloc_id":"fb073ab4-0928-dd31-af16-9d09356df977","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device","task":"promtail","type":"Setup Failure"}
{"@level":"error","@message":"postrun failed","@module":"client.alloc_runner","@timestamp":"2024-09-30T16:30:01.373888+02:00","alloc_id":"fb073ab4-0928-dd31-af16-9d09356df977","error":"hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device"}
{"@level":"error","@message":"prerun failed","@module":"client.alloc_runner","@timestamp":"2024-09-30T16:30:01.698021+02:00","alloc_id":"29a30585-27f8-6fd0-710c-4ff80df5f7f7","error":"pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-09-30T16:30:01.698061+02:00","alloc_id":"29a30585-27f8-6fd0-710c-4ff80df5f7f7","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device","task":"promtail","type":"Setup Failure"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-09-30T16:30:01.702273+02:00","alloc_id":"29a30585-27f8-6fd0-710c-4ff80df5f7f7","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device","task":"timer","type":"Setup Failure"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-09-30T16:30:01.707199+02:00","alloc_id":"29a30585-27f8-6fd0-710c-4ff80df5f7f7","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device","task":"nix-setup-profiles","type":"Setup Failure"}
{"@level":"error","@message":"postrun failed","@module":"client.alloc_runner","@timestamp":"2024-09-30T16:30:01.722370+02:00","alloc_id":"29a30585-27f8-6fd0-710c-4ff80df5f7f7","error":"hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device"}
{"@level":"error","@message":"postrun failed","@module":"client.alloc_runner","@timestamp":"2024-09-30T16:30:04.591155+02:00","alloc_id":"fcf42431-4866-956f-0fd6-cfb3bb0bc6f7","error":"hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device"}
{"@level":"error","@message":"postrun failed","@module":"client.alloc_runner","@timestamp":"2024-09-30T16:30:05.415728+02:00","alloc_id":"b4282dcd-db53-0b92-7460-4948414fdc46","error":"hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device"}
{"@level":"error","@message":"prerun failed","@module":"client.alloc_runner","@timestamp":"2024-09-30T16:31:00.682071+02:00","alloc_id":"1acbe5ca-f674-40f5-2eff-9c19dbd388ed","error":"pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-09-30T16:31:00.682114+02:00","alloc_id":"1acbe5ca-f674-40f5-2eff-9c19dbd388ed","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device","task":"timer","type":"Setup Failure"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-09-30T16:31:00.684033+02:00","alloc_id":"1acbe5ca-f674-40f5-2eff-9c19dbd388ed","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device","task":"nix-setup-profiles","type":"Setup Failure"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-09-30T16:31:00.686174+02:00","alloc_id":"1acbe5ca-f674-40f5-2eff-9c19dbd388ed","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device","task":"promtail","type":"Setup Failure"}
{"@level":"error","@message":"postrun failed","@module":"client.alloc_runner","@timestamp":"2024-09-30T16:31:00.703343+02:00","alloc_id":"1acbe5ca-f674-40f5-2eff-9c19dbd388ed","error":"hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device"}
{"@level":"error","@message":"prerun failed","@module":"client.alloc_runner","@timestamp":"2024-09-30T16:32:00.641821+02:00","alloc_id":"9bfa85eb-c32f-d688-33ed-eb2705f66a5b","error":"pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-09-30T16:32:00.641890+02:00","alloc_id":"9bfa85eb-c32f-d688-33ed-eb2705f66a5b","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device","task":"timer","type":"Setup Failure"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-09-30T16:32:00.644153+02:00","alloc_id":"9bfa85eb-c32f-d688-33ed-eb2705f66a5b","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device","task":"nix-setup-profiles","type":"Setup Failure"}
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2024-09-30T16:32:00.646231+02:00","alloc_id":"9bfa85eb-c32f-d688-33ed-eb2705f66a5b","failed":true,"msg":"failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device","task":"promtail","type":"Setup Failure"}
{"@level":"error","@message":"postrun failed","@module":"client.alloc_runner","@timestamp":"2024-09-30T16:32:00.663011+02:00","alloc_id":"9bfa85eb-c32f-d688-33ed-eb2705f66a5b","error":"hook \"cpuparts_hook\" failed: write /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus: no space left on device"}
It doesn't look like the CPUs overlapped this time. The number of dying descendants is curious, I wonder if it's related:
# head /sys/fs/cgroup/nomad.slice/cgroup.stat /sys/fs/cgroup/nomad.slice/reserve.slice/cgroup.stat /sys/fs/cgroup/nomad.slice/share.slice/cgroup.stat
==> /sys/fs/cgroup/nomad.slice/cgroup.stat <==
nr_descendants 28
nr_dying_descendants 2356
==> /sys/fs/cgroup/nomad.slice/reserve.slice/cgroup.stat <==
nr_descendants 1
nr_dying_descendants 78
==> /sys/fs/cgroup/nomad.slice/share.slice/cgroup.stat <==
nr_descendants 25
nr_dying_descendants 2278
Can you confirm whether the cpuset.mem
file exists in the reserve.slice
? And what's cgroup.max.descendants
set to? Example:
$ cat /sys/fs/cgroup/nomad.slice/reserve.slice/cgroup.max.descendants
max
And what's
cgroup.max.descendants
set to?
I did look at that at failure time and from memory it was at max
.
Can you confirm whether the
cpuset.mem
file exists in thereserve.slice
?
I can't find any cpuset.mem
anywhere, did you mean cpuset.mems
? This latter one is present and seems empty for all allocations as far as I can see. I somehow missed this, but this is indeed a machine with NUMA. We widely use memory
blocks but none with numa
configuration.
$ lsmem -o +NODE
RANGE SIZE STATE REMOVABLE BLOCK NODE
0x0000000000000000-0x00000000bfffffff 3G online yes 0-2 0
0x0000000100000000-0x000000103fffffff 61G online yes 4-64 0
0x0000001040000000-0x000000203fffffff 64G online yes 65-128 1
Memory block size: 1G
Total online memory: 128G
Total offline memory: 0B
I'll doublecheck cgroup.max.descendants
and cgroup.mem{s,}
as soon as it happens again and update this issue. Thanks again for looking into this!
Just happened again. (It has been happening strangely often lately.) Here are the values requested. cpuset.mems
always seems to be present and empty in all cgroups managed by Nomad.
$ head /sys/fs/cgroup/nomad.slice/reserve.slice/cgroup.max.descendants
max
$ head /sys/fs/cgroup/nomad.slice/reserve.slice/cgroup.stat
nr_descendants 1
nr_dying_descendants 11
$ head /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus
0-3
$ head /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.mems
$
This doesn't look like it should be possible, though:
$ head /sys/fs/cgroup/nomad.slice/cpuset.cpus
0-31
$ head /sys/fs/cgroup/nomad.slice/share.slice/cpuset.cpus
4-31
$ head /sys/fs/cgroup/nomad.slice/reserve.slice/cpuset.cpus
0-3
$ head /sys/fs/cgroup/nomad.slice/reserve.slice/bd759fed-5e8d-90b4-2110-94d5f79737a8.realtime-gunicorn.scope/cpuset.cpus
4-7
It might just be an artifact of how the data is collected since I don't think it's possible to do an atomic snapshot of cgroups.
Hi @rodrigol-chan - just to clarify, is this only happening on this one specific node? Are there any tasks still running on this node that were originally created from before the upgrade to Nomad 1.7? Has the node been rebooted since the upgrade to Nomad 1.7?
For some additional context, we've been investigating to figure out the circumstances in which the kernel can return this "no space left on device" error in the first place.
That error is referred to as ENOSPC
and in the kernel you've got for Ubuntu 22.04, there's only one place that can be returned for cgroups v2. That's in validate_change
in cpuset.c#L637-L649
. I'm pointing to the mirror of Torvald's tree here but I've confirmed this function is the same on Ubuntu's tree for my current 22.04 kernel:
$ git remote add jammy git://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/jammy
$ git fetch jammy # wait a while...
$ git checkout -b jammy-5.15.0-124.134 Ubuntu-5.15.0-124.134
Here's the relevant section, with a helpful comment:
/*
* Cpusets with tasks - existing or newly being attached - can't
* be changed to have empty cpus_allowed or mems_allowed.
*/
ret = -ENOSPC;
if ((cgroup_is_populated(cur->css.cgroup) || cur->attach_in_progress)) {
if (!cpumask_empty(cur->cpus_allowed) &&
cpumask_empty(trial->cpus_allowed))
goto out;
if (!nodes_empty(cur->mems_allowed) &&
nodes_empty(trial->mems_allowed))
goto out;
}
So that suggests that we're somehow ending up in a state where the cpuset is being emptied of cpus or mems allowed while the task is still live. That's the source of @shoenig's follow-up questions above.
is this only happening on this one specific node?
No, it happens on more nodes, though I just noticed that it only happens on nodes where we allow periodic jobs to run. The nodes where we do not allow periodic jobs have the exact same configuration as the ones where we do, with the difference that they are preemptible instances, i.e. Google will arbitrarily power them off.
Are there any tasks still running on this node that were originally created from before the upgrade to Nomad 1.7? Has the node been rebooted since the upgrade to Nomad 1.7?
The oldest running allocation I see is from 18th October (7 days ago), whose job was submitted on Oct 14th. The oldest current/running job version is from 2024-06-25T14:38:16Z, a few days after the 1.7 upgrade, and the same job also contains the oldest job version that Nomad still remembers, dated 2024-05-02T09:22:13Z. The vast majority of jobs have been submitted this week since we do 20+ releases per day.
All nodes run unattended-upgrades
and have rebooted since Oct 15th.
I want to clarify that we're running the linux-gcp
kernel since we're on Google Cloud, so we're actually currently running the 6.8 kernel. At the time this started, I believe we were on 6.5.
$ uname -a
Linux nomad-client-camel 6.8.0-1016-gcp #18~22.04.1-Ubuntu SMP Tue Oct 8 14:58:58 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
that suggests that we're somehow ending up in a state where the cpuset is being emptied of cpus or mems allowed while the task is still live
I'll add some more instrumentation to look at the process tree when the issue happens. Is there any more information I can produce?
we're actually currently running the 6.8 kernel.
Ok, in the 6.8 kernel there's a second place this error can appear (ref cpuset.c#L3250-L3262
), which is when the effective_cpus are empty.
I'll add some more instrumentation to look at the process tree when the issue happens. Is there any more information I can produce?
I suspect we want to look at all the cpuset
files in the tree. Something like:
for f in /sys/fs/cgroup/cpuset.*; do echo -n "$f :"; cat "$f"; done
for f in /sys/fs/cgroup/nomad.slice/cpuset.*; do echo -n "$f :"; cat "$f"; done
for f in /sys/fs/cgroup/nomad.slice/*.slice/cpuset.*; do echo -n "$f :"; cat "$f"; done
for f in /sys/fs/cgroup/nomad.slice/*.slice/*.scope/cpuset.*; do echo -n "$f :"; cat "$f"; done
Perhaps this (partially) relates to #24304 / #24297
Nomad version
Operating system and Environment details
Running Ubuntu 22.04 on Google Cloud in an n2d-standard-32 instance.
Issue
Alerts fired due to failed allocations. Upon investigation, I noticed the following log line:
Also interesting to observe is that, unlike in our other 1.7.x clients, there's overlap between the CPUs for the reserve and share slices:
Reproduction steps
Not clear how to reproduce. This happened on a single instance. All allocations that failed are from periodic jobs, running with on the
exec
driver with nocore
constraints.Expected Result
Allocations spawn successfully.
Actual Result
Allocations failed to spawn.
Nomad Client logs (if appropriate)
Nomad client configuration