Open andrew-kennedy opened 4 years ago
Do you have the folder /sys/fs/cgroup/cpuset
?
Did you execute as root?
I did run as root, and the cgroup folder has a different hierarchy, here's what I do have:
$ ls -1 -p /sys/fs/cgroup
cgroup.controllers
cgroup.max.depth
cgroup.max.descendants
cgroup.procs
cgroup.stat
cgroup.subtree_control
cgroup.threads
cpu.pressure
cpuset.cpus.effective
cpuset.mems.effective
init.scope/
io.cost.model
io.cost.qos
io.pressure
machine.slice/
memory.pressure
system.slice/
user.slice/
Hmm, you really seem to have cgroup2 mounted in /sys/fs/cgroup
directly.
I have the same mounted under /sys/fs/cgroup/unified
.
Right now, I do not have support for cgroup2, and also not for "nonstandard paths".
➜ mount | grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,size=4096k,nr_inodes=1024,mode=755)
cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
➜ cd /sys/fs/cgroup/unified
➜ ls
cgroup.controllers cgroup.max.descendants cgroup.stat cgroup.threads init.scope machine.slice system.slice
cgroup.max.depth cgroup.procs cgroup.subtree_control cpu.pressure io.pressure memory.pressure user.slice
Small addition: I just looked at the documentation for cgroups, and they seem to not allow setting the the CPUs to run the group on.
I actually learned how to do this using systemd instead of vfio-isolate because apparently it allows for controlling cgroups:
Contents of libvirt hook for prepare/begin/isolate.sh
:
systemctl set-property --runtime -- user.slice AllowedCPUs=0,1,2,3,4,5,12,13,14,15,16,17
systemctl set-property --runtime -- system.slice AllowedCPUs=0,1,2,3,4,5,12,13,14,15,16,17
systemctl set-property --runtime -- init.scope AllowedCPUs=0,1,2,3,4,5,12,13,14,15,16,17
This restricts all already running tasks to the listed CPUs, and for vfio I let libvirt pinning do the work because the pinned cores are in the machine.slice group which ends up on the other CPUs:
# cat /sys/fs/cgroup/machine.slice/machine-qemu\\x2d3\\x2dgaming.scope/vcpu{0..11}/cpuset.cpus.effective
6
18
7
19
8
20
9
21
10
22
11
23
the stuff in machine.slice is purely handled by the pinning in my domain xml file:
<vcpu placement='static' cpuset='6-11,18-23'>12</vcpu>
<iothreads>4</iothreads>
<cputune>
<vcpupin vcpu='0' cpuset='6'/>
<vcpupin vcpu='1' cpuset='18'/>
<vcpupin vcpu='2' cpuset='7'/>
<vcpupin vcpu='3' cpuset='19'/>
<vcpupin vcpu='4' cpuset='8'/>
<vcpupin vcpu='5' cpuset='20'/>
<vcpupin vcpu='6' cpuset='9'/>
<vcpupin vcpu='7' cpuset='21'/>
<vcpupin vcpu='8' cpuset='10'/>
<vcpupin vcpu='9' cpuset='22'/>
<vcpupin vcpu='10' cpuset='11'/>
<vcpupin vcpu='11' cpuset='23'/>
<emulatorpin cpuset='0-5,12-17'/>
<iothreadpin iothread='1' cpuset='0-5,12-17'/>
<iothreadpin iothread='2' cpuset='0-5,12-17'/>
<iothreadpin iothread='3' cpuset='0-5,12-17'/>
<iothreadpin iothread='4' cpuset='0-5,12-17'/>
</cputune>
Sorry for comment spam, but here's the documentation for cgroups v2 that appears to describe how cpusets allow for core isolation: [https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#cpuset-interface-files]()
I actually learned how to do this using systemd
Does that migrate existing kernel threads from isolated cores? Actually dies vfio-isolate do it? cset does afaik.
I'm not entirely sure, I assumed vfio-isolate did so and that this solution did so because I thought kernel threads were either in init.scope or system.slice.
@rokups vfio-isolate
migrates all threads that can be migrated (with move-tasks / /host.slice
).
Some kernel threads cannot be moved though, and cset
fails on them as well.
@andrew-kennedy I use the exact same cpuset interface. Problem is, it's mounted in a different location in your case. I'll have to figure out a way to detect how to get to the cpuset.
Could you check how the change
systemctl set-property --runtime -- user.slice AllowedCPUs=0,1,2,3,4,5,12,13,14,15,16,17
is actually reflected in your /sys/fs/cgroup/user.slice
?
On my machine, there are is no cpuset.cpus
in this folder.
I would assume your machine is using cgroups v1, fedora 32 defaults to fully unified hierarchy with cgroupsv2 so has a different structure.
When I run that systemctl command, I can see this change reflected:
# cat /sys/fs/cgroup/user.slice/cpuset.cpus
0-5,12-17
Ok. Found a way to configure my system (Arch) to look like yours by setting a kernel parameter. Will try to come up with a version that supports cgroup2 soon.
@spheenik any progress on supporting cgroup2? Seems like more and more stuff are moving towards cgroup2
Jep, Arch did as well, so I'll cook something up really soon.
It would probably be easiest (and most correct) to configure cgroups via systemd, simply using something like systemctl set-property --runtime
on the relevant slice units. You'd also need to explicitly create a slice (host.slice
) for the root tasks (which is as simple as dropping a file into /run/systemd/system
and starting it).
@intelfx Can you get more specific? I attempted to set up a VFIO hook that runs these commands:
systemctl set-property --runtime -- user.slice AllowedCPUs=0,1,2,3,4,5,12,13,14,15,16,17
systemctl set-property --runtime -- system.slice AllowedCPUs=0,1,2,3,4,5,12,13,14,15,16,17
systemctl set-property --runtime -- init.scope AllowedCPUs=0,1,2,3,4,5,12,13,14,15,16,17
to attempt to move all non-qemu system tasks to the AllowedCPUs. But it doesn't appear to actually work, as some tasks are still executing on the VM cores.
@andrew-kennedy That's more-or-less what is supposed to work. Check systemctl show -p AllowedCPUs user.slice
and /sys/fs/cgroup/user.slice/cpuset.cpus.effective
(similarly for other slices), and note that this doesn't touch kthreads.
Heads up all: Sorry for taking so long, there's now a version 0.4.0 which allows working with cgroups v2.
Using the latest vfio-isolate I get this traceback when I attempt to create any cpuset:
I am using cgroups v2 I believe in general as fedora does by default now. Is this not a supported use case?