spheenik / vfio-isolate

CPU and memory isolation for VFIO
MIT License
89 stars 8 forks source link

[Request] Setup guide update for AMD V-Cache CPUs #14

Open ggeorgo opened 1 year ago

ggeorgo commented 1 year ago

With all the 79X0X3D processors out there, is it possible to update the setup guide with some examples on how to:

spheenik commented 1 year ago

Hi @ggeorgo,

that's a bit difficult, since I do not have access to a 79X0X3D at the moment.

From what I heard, the VCache is always on the first die, which would mean that if there's no way to tell the bios to boot from the second one, you'll end up with core 0 being used by Linux. A description of the problems that brings is here on reddit.

It is still worth a try to just isolate the first die and just live with the flaw that the host schedules some work there. I have no data on how this affects things.

You'd basically do something like this

sudo vfio-isolate \ 
    cpuset-modify --cpus C9-15,25-31 /system.slice \
    cpuset-modify --cpus C9-15,25-31 /user.slice

to isolate host execution to the second die, cores 1-7. Then configure the VM to use the complete first die for the VM, and emulation on second die, core 0.

<cputune>
    <vcpupin vcpu='0' cpuset='0'/>
    <vcpupin vcpu='1' cpuset='16'/>
    <vcpupin vcpu='2' cpuset='1'/>
    <vcpupin vcpu='3' cpuset='17'/>
    <vcpupin vcpu='4' cpuset='2'/>
    <vcpupin vcpu='5' cpuset='18'/>
    <vcpupin vcpu='6' cpuset='3'/>
    <vcpupin vcpu='7' cpuset='19'/>
    <vcpupin vcpu='8' cpuset='4'/>
    <vcpupin vcpu='9' cpuset='20'/>
    <vcpupin vcpu='10' cpuset='5'/>
    <vcpupin vcpu='11' cpuset='21'/>
    <vcpupin vcpu='12' cpuset='6'/>
    <vcpupin vcpu='13' cpuset='22'/>
    <vcpupin vcpu='14' cpuset='7'/>
    <vcpupin vcpu='15' cpuset='23'/>
    <emulatorpin cpuset='8,24'/>
    <iothreadpin iothread='1' cpuset='8,24'/>
    ...
</cputune>

Good luck, and report back with results!

ggeorgo commented 1 year ago

Thank you Martin for your prompt response. Actually, that thread on Reddit is mine :D Will update both, as I am setting up my new box next week.

Regards, G

On Sat, 1 Apr 2023 at 12:22, Martin Schrodt @.***> wrote:

Hi @ggeorgo https://github.com/ggeorgo,

that's a bit difficult, since I do not have access to a 79X0X3D at the moment.

From what I heard, the VCache is always on the first die, which would mean that if there's no way to tell the bios to boot from the second one, you'll end up with core 0 being used by Linux. A description of the problems that brings is here on reddit https://www.reddit.com/r/VFIO/comments/11ni06e/pinning_and_isolation_of_7950x3d/jbx6a7g/ .

It is still worth a try to just isolate the first die and just live with the flaw that the host schedules some work there. I have no data on how this affects things.

You'd basically do something like this

sudo vfio-isolate \ cpuset-modify --cpus C9-15,25-31 /system.slice \ cpuset-modify --cpus C9-15,25-31 /user.slice

to isolate host execution to the second die, cores 1-7. Then configure the VM to use the complete first die for the VM, and emulation on second die, core 0.

...

Good luck, and report back with results!

— Reply to this email directly, view it on GitHub https://github.com/spheenik/vfio-isolate/issues/14#issuecomment-1492942830, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALVA5R33XY7HCWU7WPCICNLW7AFW5ANCNFSM6AAAAAAWPQYDFY . You are receiving this because you were mentioned.Message ID: @.***>

spheenik commented 1 year ago

Any updates? We're on the edges of our seats here ;-)

ggeorgo commented 1 year ago

Any updates? We're on the edges of our seats here ;-)

Sorry, forgot to update, however this is the 1st weekend I am using it, since I was battling with a bug for 10 days... Updated though!

ggeorgo commented 1 year ago

Hi @spheenik ! After sometime I managed to commit a bit of my time and try it out. My problem is that I am getting errors while trying to run it, either from console, or when preparing the qemu.

` HCPUS=8-15,24-31 MCPUS=0-6,16-22 ACPUS=0-31

enable_isolation () { vfio-isolate \ drop-caches \ cpu-governor performance \ cpuset-modify --cpus C$HCPUS /system.slice \ cpuset-modify --cpus C$HCPUS /user.slice \ move-tasks / /system.slice \ compact-memory \ irq-affinity mask C$MCPUS `

and I am getting below errors: Traceback (most recent call last): File "/usr/bin/vfio-isolate", line 33, in <module> sys.exit(load_entry_point('vfio-isolate==0.5.2', 'console_scripts', 'vfio-isolate')()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/site-packages/vfio_isolate/cli.py", line 199, in run_cli cli(standalone_mode=False, obj=executor) File "/usr/lib/python3.11/site-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) ^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/site-packages/click/core.py", line 1706, in invoke sub_ctx = cmd.make_context( ^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/site-packages/click/core.py", line 943, in make_context self.parse_args(ctx, args) File "/usr/lib/python3.11/site-packages/click/core.py", line 1408, in parse_args value, args = param.handle_parse_result(ctx, opts, args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/site-packages/click/core.py", line 2400, in handle_parse_result value = self.process_value(ctx, value) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/site-packages/click/core.py", line 2362, in process_value value = self.callback(ctx, self, value) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/site-packages/vfio_isolate/cli.py", line 39, in cb_cpu_nodeset raise click.BadParameter('must start with C or N') click.exceptions.BadParameter: must start with C or N

when I omit "cpu-governor performance" I do not get the errors

ggeorgo commented 8 months ago

I am not sure what changed in the last month or so, however, isolation stopped working for me.

I have tested this before, and was working fine, but seems that lately it is not, as I run “stress --cpu 64” while my VM was running and to my surprise, all 32 threads went up to 100%, while previously this was the case for half of them and isolation would prevent VM cores to be used by the host.

My system (I can post more info if needed):

Operating System: Manjaro Linux 
KDE Plasma Version: 5.27.10
KDE Frameworks Version: 5.115.0
Qt Version: 5.15.12
Kernel Version: 6.6.19-1-MANJARO (64-bit)
Graphics Platform: X11
Processors: 32 × AMD Ryzen 9 7950X3D 16-Core Processor
Memory: 61.9 GiB of RAM
Graphics Processor: AMD Radeon Graphics
Manufacturer: Gigabyte Technology Co., Ltd.
Product Name: X670E AORUS MASTER

My XML file:

<domain type="kvm">
  <name>win11-games</name>
  <uuid>1e666676-xxxxx</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://microsoft.com/win/11"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit="KiB">33554432</memory>
  <currentMemory unit="KiB">33554432</currentMemory>
  <memoryBacking>
    <hugepages/>
    <nosharepages/>
    <locked/>
    <access mode="private"/>
    <allocation mode="immediate"/>
    <discard/>
  </memoryBacking>
  <vcpu placement="static">16</vcpu>
  <iothreads>2</iothreads>
  <cputune>
    <vcpupin vcpu="0" cpuset="0"/>
    <vcpupin vcpu="1" cpuset="16"/>
    <vcpupin vcpu="2" cpuset="1"/>
    <vcpupin vcpu="3" cpuset="17"/>
    <vcpupin vcpu="4" cpuset="2"/>
    <vcpupin vcpu="5" cpuset="18"/>
    <vcpupin vcpu="6" cpuset="3"/>
    <vcpupin vcpu="7" cpuset="19"/>
    <vcpupin vcpu="8" cpuset="4"/>
    <vcpupin vcpu="9" cpuset="20"/>
    <vcpupin vcpu="10" cpuset="5"/>
    <vcpupin vcpu="11" cpuset="21"/>
    <vcpupin vcpu="12" cpuset="6"/>
    <vcpupin vcpu="13" cpuset="22"/>
    <vcpupin vcpu="14" cpuset="7"/>
    <vcpupin vcpu="15" cpuset="23"/>
    <emulatorpin cpuset="15,31"/>
    <iothreadpin iothread="1" cpuset="13,29"/>
    <iothreadpin iothread="2" cpuset="14,30"/>
    <emulatorsched scheduler="fifo" priority="10"/>
    <vcpusched vcpus="0" scheduler="rr" priority="1"/>
    <vcpusched vcpus="1" scheduler="rr" priority="1"/>
    <vcpusched vcpus="2" scheduler="rr" priority="1"/>
    <vcpusched vcpus="3" scheduler="rr" priority="1"/>
    <vcpusched vcpus="4" scheduler="rr" priority="1"/>
    <vcpusched vcpus="5" scheduler="rr" priority="1"/>
    <vcpusched vcpus="6" scheduler="rr" priority="1"/>
    <vcpusched vcpus="7" scheduler="rr" priority="1"/>
    <vcpusched vcpus="8" scheduler="rr" priority="1"/>
    <vcpusched vcpus="9" scheduler="rr" priority="1"/>
    <vcpusched vcpus="10" scheduler="rr" priority="1"/>
    <vcpusched vcpus="11" scheduler="rr" priority="1"/>
    <vcpusched vcpus="12" scheduler="rr" priority="1"/>
    <vcpusched vcpus="13" scheduler="rr" priority="1"/>
    <vcpusched vcpus="14" scheduler="rr" priority="1"/>
    <vcpusched vcpus="15" scheduler="rr" priority="1"/>
  </cputune>
  <sysinfo type="smbios">
    <bios>
      <entry name="vendor">American Megatrends International, LLC.</entry>
      <entry name="version">F21</entry>
      <entry name="date">10/01/2024</entry>
    </bios>
    <system>
      <entry name="manufacturer">Gigabyte Technology Co., Ltd.</entry>
      <entry name="product">X670E AORUS MASTER</entry>
      <entry name="version">1.0</entry>
      <entry name="serial">12345678</entry>
      <entry name="uuid">1e666676-xxxx</entry>
      <entry name="sku">GBX670EAM</entry>
      <entry name="family">X670E MB</entry>
    </system>
  </sysinfo>
  <os firmware="efi">
    <type arch="x86_64" machine="pc-q35-8.1">hvm</type>
    <firmware>
      <feature enabled="no" name="enrolled-keys"/>
      <feature enabled="no" name="secure-boot"/>
    </firmware>
    <loader readonly="yes" type="pflash">/usr/share/edk2/x64/OVMF_CODE.fd</loader>
    <nvram template="/usr/share/edk2/x64/OVMF_VARS.fd">/var/lib/libvirt/qemu/nvram/win11-games_VARS.fd</nvram>
    <smbios mode="sysinfo"/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv mode="passthrough">
      <relaxed state="on"/>
      <vapic state="on"/>
      <spinlocks state="on" retries="8191"/>
      <vpindex state="on"/>
      <synic state="on"/>
      <stimer state="on">
        <direct state="on"/>
      </stimer>
      <reset state="on"/>
      <vendor_id state="on" value="OriginalAMD"/>
      <frequencies state="on"/>
      <reenlightenment state="off"/>
      <tlbflush state="on"/>
      <ipi state="on"/>
      <evmcs state="off"/>
      <avic state="on"/>
    </hyperv>
    <kvm>
      <hidden state="on"/>
    </kvm>
    <vmport state="off"/>
    <smm state="on"/>
    <ioapic driver="kvm"/>
  </features>
  <cpu mode="host-passthrough" check="none" migratable="off">
    <topology sockets="1" dies="1" cores="8" threads="2"/>
    <cache mode="passthrough"/>
    <feature policy="require" name="hypervisor"/>
    <feature policy="disable" name="aes"/>
    <feature policy="require" name="topoext"/>
    <feature policy="disable" name="x2apic"/>
    <feature policy="disable" name="svm"/>
    <feature policy="require" name="amd-stibp"/>
    <feature policy="require" name="ibpb"/>
    <feature policy="require" name="stibp"/>
    <feature policy="require" name="virt-ssbd"/>
    <feature policy="require" name="amd-ssbd"/>
    <feature policy="require" name="pdpe1gb"/>
    <feature policy="require" name="tsc-deadline"/>
    <feature policy="require" name="tsc_adjust"/>
    <feature policy="require" name="arch-capabilities"/>
    <feature policy="require" name="rdctl-no"/>
    <feature policy="require" name="skip-l1dfl-vmentry"/>
    <feature policy="require" name="mds-no"/>
    <feature policy="require" name="pschange-mc-no"/>
    <feature policy="require" name="invtsc"/>
    <feature policy="require" name="cmp_legacy"/>
    <feature policy="require" name="xsaves"/>
    <feature policy="require" name="perfctr_core"/>
    <feature policy="require" name="clzero"/>
    <feature policy="require" name="xsaveerptr"/>
  </cpu>
  <clock offset="timezone" timezone="Europe/Dublin">
    <timer name="rtc" present="no" tickpolicy="catchup"/>
    <timer name="pit" tickpolicy="discard"/>
    <timer name="hpet" present="no"/>
    <timer name="kvmclock" present="no"/>
    <timer name="hypervclock" present="yes"/>
    <timer name="tsc" present="yes" mode="native"/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled="no"/>
    <suspend-to-disk enabled="no"/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type="block" device="disk">
      <driver name="qemu" type="raw" cache="none" io="native"/>
      <source dev="/dev/sdb"/>
      <target dev="sdb" bus="sata"/>
      <boot order="1"/>
      <address type="drive" controller="0" bus="0" target="0" unit="1"/>
    </disk>
    <disk type="file" device="cdrom">
      <driver name="qemu" type="raw"/>
      <source file="/home/user/Downloads/Linux/virtio-win-0.1.229.iso"/>
      <target dev="sdc" bus="sata"/>
      <readonly/>
      <address type="drive" controller="0" bus="0" target="0" unit="2"/>
    </disk>
    <controller type="usb" index="0" model="qemu-xhci" ports="15">
      <address type="pci" domain="0x0000" bus="0x02" slot="0x00" function="0x0"/>
    </controller>
    <controller type="pci" index="0" model="pcie-root"/>
    <controller type="pci" index="1" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="1" port="0x10"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x0" multifunction="on"/>
    </controller>
    <controller type="pci" index="2" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="2" port="0x11"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x1"/>
    </controller>
    <controller type="pci" index="3" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="3" port="0x12"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x2"/>
    </controller>
    <controller type="pci" index="4" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="4" port="0x13"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x3"/>
    </controller>
    <controller type="pci" index="5" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="5" port="0x14"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x4"/>
    </controller>
    <controller type="pci" index="6" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="6" port="0x15"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x5"/>
    </controller>
    <controller type="pci" index="7" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="7" port="0x16"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x6"/>
    </controller>
    <controller type="pci" index="8" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="8" port="0x17"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x7"/>
    </controller>
    <controller type="pci" index="9" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="9" port="0x18"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x0" multifunction="on"/>
    </controller>
    <controller type="pci" index="10" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="10" port="0x19"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x1"/>
    </controller>
    <controller type="pci" index="11" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="11" port="0x1a"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x2"/>
    </controller>
    <controller type="pci" index="12" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="12" port="0x1b"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x3"/>
    </controller>
    <controller type="pci" index="13" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="13" port="0x1c"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x4"/>
    </controller>
    <controller type="pci" index="14" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="14" port="0x1d"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x5"/>
    </controller>
    <controller type="sata" index="0">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x1f" function="0x2"/>
    </controller>
    <controller type="virtio-serial" index="0">
      <address type="pci" domain="0x0000" bus="0x03" slot="0x00" function="0x0"/>
    </controller>
    <interface type="direct">
      <mac address="52:54:00:20:e2:43"/>
      <source dev="enp12s0f0" mode="bridge"/>
      <model type="e1000e"/>
      <address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
    </interface>
    <serial type="pty">
      <target type="isa-serial" port="0">
        <model name="isa-serial"/>
      </target>
    </serial>
    <console type="pty">
      <target type="serial" port="0"/>
    </console>
    <channel type="spicevmc">
      <target type="virtio" name="com.redhat.spice.0"/>
      <address type="virtio-serial" controller="0" bus="0" port="1"/>
    </channel>
    <input type="mouse" bus="ps2"/>
    <input type="keyboard" bus="ps2"/>
    <graphics type="spice" autoport="yes">
      <listen type="address"/>
      <image compression="off"/>
    </graphics>
    <sound model="ich9">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x1b" function="0x0"/>
    </sound>
    <audio id="1" type="spice"/>
    <video>
      <model type="virtio" heads="1" primary="yes">
        <acceleration accel3d="no"/>
      </model>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x0"/>
    </video>
    <hostdev mode="subsystem" type="usb" managed="yes">
      <source>
        <vendor id="0x1e7d"/>
        <product id="0x2cb6"/>
      </source>
      <address type="usb" bus="0" port="3"/>
    </hostdev>
    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address domain="0x0000" bus="0x02" slot="0x00" function="0x0"/>
      </source>
      <address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/>
    </hostdev>
    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
      </source>
      <address type="pci" domain="0x0000" bus="0x05" slot="0x00" function="0x0"/>
    </hostdev>
    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address domain="0x0000" bus="0x01" slot="0x00" function="0x1"/>
      </source>
      <address type="pci" domain="0x0000" bus="0x06" slot="0x00" function="0x0"/>
    </hostdev>
    <hostdev mode="subsystem" type="usb" managed="yes">
      <source>
        <vendor id="0x187c"/>
        <product id="0x100e"/>
      </source>
      <address type="usb" bus="0" port="4"/>
    </hostdev>
    <redirdev bus="usb" type="spicevmc">
      <address type="usb" bus="0" port="1"/>
    </redirdev>
    <redirdev bus="usb" type="spicevmc">
      <address type="usb" bus="0" port="2"/>
    </redirdev>
    <watchdog model="itco" action="reset"/>
    <memballoon model="none"/>
  </devices>
</domain>

and my /etc/libvirt/hooks/qemu file (which is marked executable)

#!/bin/bash
#/etc/libvirt/hooks/qemu

HCPUS=8-15,24-31
MCPUS=0-7,16-23
ACPUS=0-31

UNDOFILE=/var/run/libvirt/qemu/vfio-isolate-undo.bin

disable_isolation () {
        vfio-isolate \
                cpuset-modify --cpus C$ACPUS /system.slice \
                cpuset-modify --cpus C$ACPUS /user.slice \
                irq-affinity mask C$ACPUS

        taskset -pc 0-23 2  # kthreadd reset
}

enable_isolation () {
        vfio-isolate \
                drop-caches \
                cpuset-modify --cpus C$HCPUS /system.slice \
                cpuset-modify --cpus C$HCPUS /user.slice \
                move-tasks / /system.slice \
                compact-memory \
                irq-affinity mask C$MCPUS

        taskset -pc $HCPUS 2  # kthreadd only on host cores
}

case "$2" in
"prepare")
        enable_isolation
        echo "prepared" >> /home/username/qemu_hook.log
        ;;
"started")
        echo "started" >> /home/username/qemu_hook.log
        ;;
"release")
        disable_isolation
        echo "released" >> /home/username/qemu_hook.log
        ;;
esac

Maybe an incompatibility with latest Kernels?

spheenik commented 7 months ago

Unsure what the problem is. It complains about the following: click.exceptions.BadParameter: must start with C or N So somehow some CPUSet it was given is missing a C or N at the start. Script looks fine though.