pop-os / pop

A project for managing all Pop!_OS sources
https://system76.com/pop
2.47k stars 87 forks source link

vfio-pci fails to bind to gpu for gpu-passthrough #968

Open BTBlueSkies opened 4 years ago

BTBlueSkies commented 4 years ago

Distribution (run cat /etc/os-release): NAME="Pop!_OS" VERSION="20.04 LTS" ID=pop ID_LIKE="ubuntu debian" PRETTY_NAME="Pop!_OS 20.04 LTS" VERSION_ID="20.04" HOME_URL="https://system76.com/pop" SUPPORT_URL="http://support.system76.com" BUG_REPORT_URL="https://github.com/pop-os/pop/issues" PRIVACY_POLICY_URL="https://system76.com/privacy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal LOGO=distributor-logo-pop-os

Related Application and/or Package Version (run apt policy $PACKAGE NAME):

Issue/Bug Description: putting vfio-pci in file /sys/bus/pci/devices/(my gpu device id's)/driver_override fails to bind the vfio-pci driver to the gpu. This works fine in 19.10

Steps to reproduce (if you know): enable bios flags for iommu and virtualization put iommu=on, amd_iommu=on on kernel line using this pattern kernelstub -a 'iommu=on' scritp to add vfio-pci to driver_overrides added to /etc/initramfs-tools/scripts/init-top/bind_vfio.sh update-initramfs -u is run

on reboot, I expect to see the driver bound to the GPU as vfio-pci, but this line is missing for the GPU I passed through. The main GPU I am not passing through shows correctly bound to nvidia driver

Expected behavior: expected 'lspci -vnn' so show vfio-pci bound to the GPU configured in /sys/bus/pci/devices/0000:28:00.0/driver_override

Other Notes: I have seen a few others reporting the same issue with 20.04 vfio-pci modules were moved to kernel in this release 'cat /lib/modules/$(uname -r)/modules.builtin' does indeed show drivers listed lsmod does not have any entries for vfio-pci as I think would be expected

tchuyev commented 4 years ago

Confirmed for me too

heavyattack commented 4 years ago

Same here. I'm following Heiko's walkthough for Pop!_OS 19.10: []https://heiko-sieger.info/creating-a-windows-10-vm-on-the-amd-ryzen-9-3900x-using-qemu-4-0-and-vga-passthrough/#Bind_Passthrough_GPU_to_VFIO_Driver(url)

vfio-pci not shown as binding to passed-through gpu on Pop OS 20.04

ChristianB011 commented 4 years ago

I am also having this issue, when I run 'lspci -vnn' I do not see any driver bound to my device that I am trying to bind vfio-pci to.

ChaOConnor commented 4 years ago

Confirm the same here. Ran using the guide above as well as https://mathiashueber.com/windows-virtual-machine-gpu-passthrough-ubuntu/ Both result in the same issue.

mathueb commented 4 years ago

I have updated the guide for Ubuntu 20.04 - https://mathiashueber.com/pci-passthrough-ubuntu-2004-virtual-machine/

tchuyev commented 4 years ago

Thanks VERY much for this detailed information, I’ll give that a try for sure!

I personnally do not find Pop!_OS to be unstable but not having the latest software or kernel is sometimes frustrating e.g. I have a new laptop with Ryzen 4000 chipset (Renoir).

Only Ubuntu 20.04.1 supports it OOB as the driver has just been backported. I’m waiting for Pop!_OS to keep up with that new release (expected in Build 12).

Manjaro Gnome crashed on me several times during updates. I may try Arch Linux but I haven’t great expectations (despite hardware support and AUR are great).

Generally, Linux on the desktop is not convincing compared to recent versions of Windows or macOS (especially with battery management, function keys, applications, etc.)

What do love about Linux however is VFIO and GPU Passthrough.

That’s why I keep using it 😉

Le 20 août 2020 à 20:54, hsieger notifications@github.com a écrit :

 Other Notes: I have seen a few others reporting the same issue with 20.04 vfio-pci modules were moved to kernel in this release 'cat /lib/modules/$(uname -r)/modules.builtin' does indeed show drivers listed lsmod does not have any entries for vfio-pci as I think would be expected

The cat ... command shows the available modules that can be loaded. They should have been loaded if the update-initramfs command would have updated the initial ramdisk that's been loaded. So I think my previous post is the answer to it - in short, manually replace the initram file under /boot/efi/... by the latest version under /boot.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

hsieger commented 4 years ago

I've updated my tutorial here: https://heiko-sieger.info/creating-a-windows-10-vm-on-the-amd-ryzen-9-3900x-using-qemu-4-0-and-vga-passthrough Note the expandable box for Pop!_OS 20.04 / Ubuntu 20.04 containing the changes.

Would be great to receive feedback.

hsieger commented 4 years ago

I've just deleted my older posts as they may be misleading. To make it short: Ubuntu 20.04 sometime with kernel 5.4... moved vfio into the kernel, instead of loading it as modules. Unfortunately this changes things - for some reason the driver override feature doesn't work anymore. I don't know why, but it's a shame. What works is the old method of specifying the PCI vendor:model IDs in the grub file. Although my tutorial refers to Pop!_OS 19.10, I've included the Ubuntu/Pop_OS 20.04 change in a drop-down box. See link in my previous post. This is not really a bug of Pop!_OS, but one that affects probably all Ubuntu-based distros (I've tested with Linux Mint 20).

kb5rir commented 4 years ago

I can confirm its doing it on Linux Mint 20, Kernel 5.4.0-47 My Nvidia GPU and HDMI controller just disappear out of lspci -vvn

hsieger commented 3 years ago

Starting with kernel 5.4 Ubuntu has made some changes in the way the driver override feature works. They are now using the driverctl utility (available as a package) to do the driver override. There are other ways to accomplish the same - for an overview of several popular methods, see my post here: https://www.heiko-sieger.info/blacklisting-graphics-driver/

In essence, the driverctl script works just like the driver override script. To bind to the vfio-pci driver:

echo vfio-pci > /sys/bus/pci/devices/0000\:01\:00.0/driver_override

echo 0000:01:00.0 > /sys/bus/pci/drivers_probe

To unbind:

echo 0000:01:00.0 > /sys/bus/pci/devices/0000\:01\:00.0/driver/unbind

Although the above will do the job, I suppose it's best to use the driverctl utility. See https://gitlab.com/driverctl/driverctl

Hope this solves the issue.