spheenik / vfio-isolate

CPU and memory isolation for VFIO
MIT License
90 stars 8 forks source link

Cannot create cpusets on Fedora 32 w/ kernel 5.7.17-200.fc32.x86_64 #3

Open andrew-kennedy opened 4 years ago

andrew-kennedy commented 4 years ago

Using the latest vfio-isolate I get this traceback when I attempt to create any cpuset:

# vfio-isolate cpuset-create --cpus C0-1 /host.slice
Traceback (most recent call last):
  File "/usr/local/bin/vfio-isolate", line 11, in <module>
    load_entry_point('vfio-isolate==0.3.1', 'console_scripts', 'vfio-isolate')()
  File "/usr/local/lib/python3.8/site-packages/vfio_isolate/cli.py", line 182, in run_cli
    executor.run()
  File "/usr/local/lib/python3.8/site-packages/vfio_isolate/cli.py", line 175, in run
    e.action.execute(e.params)
  File "/usr/local/lib/python3.8/site-packages/vfio_isolate/action/cpuset_create.py", line 21, in execute
    cpu_set.create()
  File "/usr/local/lib/python3.8/site-packages/vfio_isolate/cpuset.py", line 43, in create
    os.mkdir(self.__path())
FileNotFoundError: [Errno 2] No such file or directory: '/sys/fs/cgroup/cpuset/host.slice'

I am using cgroups v2 I believe in general as fedora does by default now. Is this not a supported use case?

spheenik commented 4 years ago

Do you have the folder /sys/fs/cgroup/cpuset? Did you execute as root?

andrew-kennedy commented 4 years ago

I did run as root, and the cgroup folder has a different hierarchy, here's what I do have:

$ ls  -1 -p /sys/fs/cgroup
cgroup.controllers
cgroup.max.depth
cgroup.max.descendants
cgroup.procs
cgroup.stat
cgroup.subtree_control
cgroup.threads
cpu.pressure
cpuset.cpus.effective
cpuset.mems.effective
init.scope/
io.cost.model
io.cost.qos
io.pressure
machine.slice/
memory.pressure
system.slice/
user.slice/
spheenik commented 4 years ago

Hmm, you really seem to have cgroup2 mounted in /sys/fs/cgroup directly. I have the same mounted under /sys/fs/cgroup/unified.

Right now, I do not have support for cgroup2, and also not for "nonstandard paths".

➜  mount | grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,size=4096k,nr_inodes=1024,mode=755)
cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
➜  cd  /sys/fs/cgroup/unified
➜  ls
cgroup.controllers  cgroup.max.descendants  cgroup.stat             cgroup.threads  init.scope   machine.slice    system.slice
cgroup.max.depth    cgroup.procs            cgroup.subtree_control  cpu.pressure    io.pressure  memory.pressure  user.slice
spheenik commented 4 years ago

Small addition: I just looked at the documentation for cgroups, and they seem to not allow setting the the CPUs to run the group on.

andrew-kennedy commented 4 years ago

I actually learned how to do this using systemd instead of vfio-isolate because apparently it allows for controlling cgroups:

Contents of libvirt hook for prepare/begin/isolate.sh:

systemctl set-property --runtime -- user.slice AllowedCPUs=0,1,2,3,4,5,12,13,14,15,16,17
systemctl set-property --runtime -- system.slice AllowedCPUs=0,1,2,3,4,5,12,13,14,15,16,17
systemctl set-property --runtime -- init.scope AllowedCPUs=0,1,2,3,4,5,12,13,14,15,16,17

This restricts all already running tasks to the listed CPUs, and for vfio I let libvirt pinning do the work because the pinned cores are in the machine.slice group which ends up on the other CPUs:

# cat /sys/fs/cgroup/machine.slice/machine-qemu\\x2d3\\x2dgaming.scope/vcpu{0..11}/cpuset.cpus.effective
6
18
7
19
8
20
9
21
10
22
11
23
andrew-kennedy commented 4 years ago

the stuff in machine.slice is purely handled by the pinning in my domain xml file:

<vcpu placement='static' cpuset='6-11,18-23'>12</vcpu>
  <iothreads>4</iothreads>
  <cputune>
    <vcpupin vcpu='0' cpuset='6'/>
    <vcpupin vcpu='1' cpuset='18'/>
    <vcpupin vcpu='2' cpuset='7'/>
    <vcpupin vcpu='3' cpuset='19'/>
    <vcpupin vcpu='4' cpuset='8'/>
    <vcpupin vcpu='5' cpuset='20'/>
    <vcpupin vcpu='6' cpuset='9'/>
    <vcpupin vcpu='7' cpuset='21'/>
    <vcpupin vcpu='8' cpuset='10'/>
    <vcpupin vcpu='9' cpuset='22'/>
    <vcpupin vcpu='10' cpuset='11'/>
    <vcpupin vcpu='11' cpuset='23'/>
    <emulatorpin cpuset='0-5,12-17'/>
    <iothreadpin iothread='1' cpuset='0-5,12-17'/>
    <iothreadpin iothread='2' cpuset='0-5,12-17'/>
    <iothreadpin iothread='3' cpuset='0-5,12-17'/>
    <iothreadpin iothread='4' cpuset='0-5,12-17'/>
  </cputune>
andrew-kennedy commented 4 years ago

Sorry for comment spam, but here's the documentation for cgroups v2 that appears to describe how cpusets allow for core isolation: [https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#cpuset-interface-files]()

rokups commented 4 years ago

I actually learned how to do this using systemd

Does that migrate existing kernel threads from isolated cores? Actually dies vfio-isolate do it? cset does afaik.

andrew-kennedy commented 4 years ago

I'm not entirely sure, I assumed vfio-isolate did so and that this solution did so because I thought kernel threads were either in init.scope or system.slice.

spheenik commented 4 years ago

@rokups vfio-isolate migrates all threads that can be migrated (with move-tasks / /host.slice). Some kernel threads cannot be moved though, and cset fails on them as well.

@andrew-kennedy I use the exact same cpuset interface. Problem is, it's mounted in a different location in your case. I'll have to figure out a way to detect how to get to the cpuset.

Could you check how the change

    systemctl set-property --runtime -- user.slice AllowedCPUs=0,1,2,3,4,5,12,13,14,15,16,17

is actually reflected in your /sys/fs/cgroup/user.slice?

On my machine, there are is no cpuset.cpus in this folder.

andrew-kennedy commented 4 years ago

I would assume your machine is using cgroups v1, fedora 32 defaults to fully unified hierarchy with cgroupsv2 so has a different structure.

When I run that systemctl command, I can see this change reflected:

# cat /sys/fs/cgroup/user.slice/cpuset.cpus
0-5,12-17
spheenik commented 4 years ago

Ok. Found a way to configure my system (Arch) to look like yours by setting a kernel parameter. Will try to come up with a version that supports cgroup2 soon.

inglor commented 3 years ago

@spheenik any progress on supporting cgroup2? Seems like more and more stuff are moving towards cgroup2

spheenik commented 3 years ago

Jep, Arch did as well, so I'll cook something up really soon.

intelfx commented 3 years ago

It would probably be easiest (and most correct) to configure cgroups via systemd, simply using something like systemctl set-property --runtime on the relevant slice units. You'd also need to explicitly create a slice (host.slice) for the root tasks (which is as simple as dropping a file into /run/systemd/system and starting it).

andrew-kennedy commented 3 years ago

@intelfx Can you get more specific? I attempted to set up a VFIO hook that runs these commands:

systemctl set-property --runtime -- user.slice AllowedCPUs=0,1,2,3,4,5,12,13,14,15,16,17
systemctl set-property --runtime -- system.slice AllowedCPUs=0,1,2,3,4,5,12,13,14,15,16,17
systemctl set-property --runtime -- init.scope AllowedCPUs=0,1,2,3,4,5,12,13,14,15,16,17

to attempt to move all non-qemu system tasks to the AllowedCPUs. But it doesn't appear to actually work, as some tasks are still executing on the VM cores.

intelfx commented 3 years ago

@andrew-kennedy That's more-or-less what is supposed to work. Check systemctl show -p AllowedCPUs user.slice and /sys/fs/cgroup/user.slice/cpuset.cpus.effective (similarly for other slices), and note that this doesn't touch kthreads.

spheenik commented 3 years ago

Heads up all: Sorry for taking so long, there's now a version 0.4.0 which allows working with cgroups v2.