jscinoz / optimus-vfio-docs

Optimus (Non-MXM/Muxless/"3D Controller") passthrough testing notes
115 stars 13 forks source link

Very hacky solution for Windows guest #2

Open arne-claeys opened 6 years ago

arne-claeys commented 6 years ago

Dear Mr Coulter

First of all, thanks a lot for your research. In the meantime I have managed to get GPU passthrough (of my muxless NVIDIA GTX 950M) on a Windows guest working as well. At the moment the solution is very hacky, but perhaps it could be useful.

To this end I have hard coded the VROM dump in the OVMF image by patching OvmfPkg/AcpiPlatformDxe/QemuFwCfgAcpi.c fwcfg_patch.txt

The VROM is read from a header file and copied to a RuntimePool to make sure it remains persistent after the OS has loaded. In the following part of the code a new ACPI table is defined that first defines an OperationRegion that points to the VROM image. At the end a slightly modified version of your ACPI table, in which I pasted a decompiled version of the ROM call from the SSDT of my laptop, is appended to the rest of the table. The RVBS variable should be set to the actual length of the VROM dump.

ssdt.asl

As currently I don't have sufficient time to figure out a more elegant solution, the following was done to compile this table.

In my case this made the Error 43 finally disappear. I hope this could be of any help.

Kind regards Arne

jscinoz commented 6 years ago

Hi Arne,

Thank you for this! I didn't realise anyone else was still looking into this. I've not had too much time to do so myself lately sadly. I'm glad to hear this got it working for you.

To try and figure out what's going on here, could you please let me know the following about your setup?

Also, can you confirm that loading the same VROM via the original ASL in this repository (without your OVMF patch) did not work on your hardware? I may well be missing something as this whole area is quite new to me, but I'd have expected them to have the same result, as the interface to the Nvidia driver itself (the _ROM method) remains the same in either case.

Cheers, Jack

arne-claeys commented 6 years ago

Hi Jack Attached you can find the libvirt XML that was used. win10-pci.txt

A Virtio GPU was assigned as the primary graphics adapter for my guest. The NVIDIA card was assigned as the guest's secondary graphics adapter. As in the Misairu tutorial, the guest was first configured using a Spice client and afterwards accessed using RemoteFX.

I can confirm Error 43 still occurred when I tried using the original ASL table and passed the VROM as a PCI ROMBAR. However it has been a while since I tried this out. As I start doubting whether I have changed the filename in this critical line of the ACPI table, I can't exclude it would have worked in an easier way. Local1 = FWGS(Local0, "genroms/10de:139b:4136:1764") I will check this later on.

As you wrote in this post that Windows clears the ROMBAR image once booted, I quickly switched to the RuntimePool approach.

Kind regards Arne

jscinoz commented 6 years ago

Thanks for the information. It's interesting that it worked for you without a GVT card. I will have to try that scenario again myself, with a fresh VM in case perhaps there is something broken in the one I've used so far.

The filename in the FWGS call is simply whatever filename the ROM ends up as in fw_cfg. I named it according to hardware IDs in my case (vendor, device, subsystem) as I intended to eventually make the ASL generic and to just read the PCI IDs from the device at PCI address 1:0:0 and load the appropriate image.

I'll give things a try on my machine with your method when I have a bit of free time and will reply back here with the result.

jscinoz commented 6 years ago

Myself and a few others have had a chance to test your patch, and I can confirm it works as far as getting further than the code 43 error :)

Unfortunately, none of us have had any luck getting 3D workloads going in the guest - were you able to do so in your setup?

jscinoz commented 6 years ago

After further testing, I can confirm 3D workloads do in fact work. What currently doesn't work (and I suspect this is the same with any RemoteFX-based setup), is fullscreen mode. I suspect we might need to emulate an Optimus setup in the VM with GVT for this to work, but thus far I haven't been able to get GVT itself to work (even without a Nvidia card involved)

arne-claeys commented 6 years ago

Nice to hear the patch helped you to finally get rid of code 43 :-) So I can conclude that 3D workloads work for you, unless you run your RDP client in full screen mode? At first sight, I find it difficult to imagine why that makes a difference. Hopefully there will be a way to solve this issue without the need to emulate Optimus with GVT in the VM. Solving the error 12 here (What about GVT-g?) doesn't really sound promising.

In my setup some simple 3D rendering tasks seemed to run on the GPU, but I did not test this in detail and never in full screen mode.

It will also take a while before I can try out something new as my own laptop is currently sent back to the manufacturer for repair.

jscinoz commented 6 years ago

So I can conclude that 3D workloads work for you, unless you run your RDP client in full screen mode?

Not quite. To clarify, it has nothing to do with whether or not the RDP client is fullscreen, but rather, whether the application (within the VM) itself runs in fullscreen. There are a few ways to reproduce this:

  1. Try running a game that defaults to fullscreen (true fullscreen, not borderless windowed). It will likely crash on startup
  2. As above, but with a benchmark; 3DMark is an example of this; it will throw an error relating to enumerating display resolutions (I don't remember the exact name of the throwing method but it was along the lines of ListAllModes)
  3. As an example of how non-fullscreen applications work, try the Unigine Heaven benchmark - it will work fine in windowed mode, but will be unable to enter fullscreen mode.

Hopefully there will be a way to solve this issue without the need to emulate Optimus with GVT in the VM. Solving the error 12 here (What about GVT-g?) doesn't really sound promising.

I do not get the Code 12 error - I suspect @Misairu-G had something else broken in their setup. I can get GVT working (and even run 3D workloads on the GVT card) if a QXL card remains the primary VGA in the VM.

What I have not been able to get working is GVT as primary VGA in the guest. There's ongoing work by Intel on this (specifically GVT dmabuf and x-display support), but it is still quite raw. Judging by this document, having GVT working as primary VGA will be necessary to trigger the hybrid-graphics behaviour in the Windows graphics stack.

arne-claeys commented 6 years ago

Thanks for the explanation. It gives me a better understanding of the problem now.

jscinoz commented 6 years ago

After a bit of testing, I've found the following things:

Going forward, I think this leaves us with a few options:

jscinoz commented 6 years ago

For anyone else looking at this, an updated OVMF patch generated against current OVMF git master is here

jscinoz commented 6 years ago

After a bit of experimentation, and a patch from upstream OVMF, I got GVT-g local display support working on my machine. Unfortunately, this does not result in a valid hybrid graphics setup, as the emulated display is a regular DisplayPort device, and as per Microsoft documentation, the iGPU needs to expose an embedded display panel of some kind.

At this point, there are two options to potentially get this working, but both are beyond my current knowledge/expertise, and I sadly don't have much free time to get up to speed in these areas:

marcosscriven commented 6 years ago

@jscinoz @arne-claeys - just trying to investigate whether this would allow gaming in a windows guest on a linux host?

I have a a Dell Precision 5520 via work, which has a Quadro M1200. Like the XPS’s, I believe this is a muxless setup, and appears as a 3D controller.

I see you mentioning ‘rendering workloads’, and indeed games based in Unreal engine, but still unclear on current state, or what the potential is here on a laptop with this setup?

marcosscriven commented 6 years ago

I found a good guide to the current status here: https://www.reddit.com/r/VFIO/comments/8gv60l/current_state_of_optimus_muxless_laptop_gpu/

Appears to mention @jscinoz’s work.

Ashymad commented 6 years ago

Sadly I didn't have any luck with getting this to work. I did however create a PKGBUILD that complies OVMF with the vBIOS patched in for people that want to test it out quickly (and are running Arch Linux). Just place your rom in the same folder, name in vBIOS.bin, and run makepkg -si. EDIT: After copying much of Arne's libvirt XML I was finally able to say goodbye to Code 43 :)

marcosscriven commented 6 years ago

@ashymad - any ideas how to get the VBIOS for something like the Dell XPS or Precision 5520?

pseudolobster commented 5 years ago

@marcosscriven I'd imagine the VBIOS is included in the system BIOS, so you will not be able to use tools which try to dump the VBIOS from the PCIe bus like you'd do for a discrete card.

The easiest way is probably to try booting up windows on bare metal, then grab the vbios from the registry. I found a guide on how to do this here: https://forums.laptopvideo2go.com/topic/32103-how-to-grab-a-notebooks-vbios-that-is-not-supported-by-nvflash/

Another way would be to decompile your system BIOS and grab the VBIOS rom out of that.

On a HP, I was able to go to support.hp.com, search my model, download the BIOS update, run it, but don't actually go through with flashing your BIOS. Just allow it to unpack, then look in C:\windows\temp or %appdata% to see where it put everything. Some installers you may be able to unpack with 7zip.

Once you have the system BIOS, you'll need to find a copy of Phoenix BIOS Editor, or some similar tool to decompile the UEFI image into its individual firmware blobs. This gave me a bunch of files with names like 4A640366-5A1D-11E2-8442-47426188709B_1693_updGOP.ROM. From there I was able to grep these ROM files for "Nvidia", and I found a copy of my VBIOS that way.

marcosscriven commented 5 years ago

Thanks so much @pseudolobster - extracting via the linked how-to worked a treat on the Dell Precision 5520.

In case that link disappears in future the basic overview is:

marcosscriven commented 5 years ago

@pseudolobster @arne-claeys @jscinoz

I extracted the bios from windows reg, but it seems to be of type x86 PC-AT rather than UEFI:

    PCIR: type 0 (x86 PC-AT), vendor: 10de, device: 13b6, class: 030200
    PCIR: revision 3, vendor revision: 1
    Last image

According to https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF that means this won't work with passthrough.

Do you know a way around that at all please?

Ashymad commented 5 years ago

@marcosscriven vBIOS being UEFI compatible is only needed with the romfile QEMU method. It is not needed when patching OVMF. I am successfully using PC-AT vBIOS.

On July 21, 2018 12:06:20 AM GMT+02:00, Marcos Scriven notifications@github.com wrote:

@pseudolobster @arne-claeys @jscinoz I extracted the bios from windows reg, But it seems to be of type x86 PC-AT rather than UEFI:

  PCIR: type 0 (x86 PC-AT), vendor: 10de, device: 13b6, class: 030200
  PCIR: revision 3, vendor revision: 1
  Last image

According to https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF that means this won't work with passthrough.

Do you know a way around that at all please?

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/jscinoz/optimus-vfio-docs/issues/2#issuecomment-406738768

hardeepd commented 5 years ago

@arne-claeys @jscinoz Thank you both very much for your work here. I've tried your patch and I still get a Code 43 in windows. I thought I'd try to debug the firmware in a linux VM and can see that the nouveau driver fails to find any vBios.

nouveau: bios: unable to locate usable image nouveau: bios: ctor failed, -22

Any ideas how to resolve this or where I should be looking?

Is there a way to verify that the OVMF firmware I've compiled does in fact have the vBIOS embedded?

Edit: I fixed it! Seems that the firmware was fine all along but there was an address problem in the ioh3420 configuration of my qemu script

marcosscriven commented 5 years ago

@arne-claeys @jscinoz

I created a patched OVMF for my Nvidia Quadro M1200 (per https://github.com/marcosscriven/ovmf-with-vbios-patch)

However, I still get error 43. I see this error in the qemu logs:

2018-08-03T12:45:56.397289Z qemu-system-x86_64: vfio-pci: Cannot read device rom at 0000:01:00.0

I've ensured those patched versions are in use, and KVM is hidden etc.

<domain type='kvm'>
  <name>win10-2</name>
  <uuid>e7d44285-507b-48da-bfe2-2eba415016bd</uuid>
  <memory unit='KiB'>8388608</memory>
  <currentMemory unit='KiB'>8388608</currentMemory>
  <vcpu placement='static'>4</vcpu>
  <os>
    <type arch='x86_64' machine='pc-q35-2.11'>hvm</type>
    <loader readonly='yes' type='pflash'>/edk2/Build/OvmfX64/RELEASE_GCC5/FV/OVMF_CODE.fd</loader>
    <nvram>/edk2/Build/OvmfX64/RELEASE_GCC5/FV/OVMF_VARS.fd</nvram>
    <boot dev='hd'/>
    <smbios mode='host'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='5DIE45JG7EAY'/>
    </hyperv>
    <kvm>
      <hidden state='on'/>
    </kvm>
    <vmport state='off'/>
  </features>

I've also ensure the device is passed through with:

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
      </source>
      <rom bar='off'/>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </hostdev>

I tried both with and without the <rom bar> tag.

IOMMUS looks to be all setup ok:

IOMMU Group 1 00:01.0 PCI bridge [0604]: Intel Corporation Skylake PCIe Controller (x16) [8086:1901] (rev 05)
IOMMU Group 1 01:00.0 3D controller [0302]: NVIDIA Corporation GM107GLM [Quadro M1200 Mobile] [10de:13b6] (rev a2)

dmesg show the vfio_pci added:

dmesg | grep -i vfio
[    2.358815] VFIO - User Level meta-driver version: 0.3
[    2.380410] vfio_pci: add [10de:13b6[ffff:ffff]] class 0x000000/00000000
[  184.054104] vfio-pci 0000:01:00.0: enabling device (0002 -> 0003)

And finally lspci shows the card is bound to vfio-pci driver:

lspci -nnk -d 10de:13b6                         
01:00.0 3D controller [0302]: NVIDIA Corporation GM107GLM [Quadro M1200 Mobile] [10de:13b6] (rev a2)
    Subsystem: Dell GM107GLM [Quadro M1200 Mobile] [1028:07bf]
    Kernel driver in use: vfio-pci
    Kernel modules: nvidiafb, nouveau

Any ideas please?

marcosscriven commented 5 years ago

@hardeepd - can you share how you worked out the ioh3420 settings and your xml confit please? I’ve posted my own PCI tree above.

marcosscriven commented 5 years ago

For reference I did finally get this working https://github.com/marcosscriven/ovmf-with-vbios-patch/blob/master/qemu/win-hybrid.xml

The tricky thing is if the GPU is attached via a bridge, you need to specify that connection:

  <qemu:commandline>
    <qemu:arg value='-set'/>
    <qemu:arg value='device.hostdev0.x-pci-sub-vendor-id=4136'/>
    <qemu:arg value='-set'/>
    <qemu:arg value='device.hostdev0.x-pci-sub-device-id=1983'/>
    <qemu:arg value='-set'/>
    <qemu:arg value='device.hostdev0.bus=pci.1'/>
  </qemu:commandline>
kalelkenobi commented 5 years ago

Hey @marcosscriven, I think I'm experiencing a similar problem. I successfully passed my dGPU to the Windows 10 x64 Guest, using @Ashymad's PKGBUILD to patch the OVMF with my vBIOS. That got me to the point were I was able to install NVIDIA drivers, but after that I'm stuck with code 43. Could you please post your entire xml? the link above did not work for me (404). Thank you very much.

marcosscriven commented 5 years ago

All my config for this is in the same linked repo https://github.com/marcosscriven/ovmf-with-vbios-patch

kalelkenobi commented 5 years ago

Sadly I had no luck, so I turn to you guys :). I'm trying to do this with my MSI GS63VR 6RF, it should be a muxless laptop with a GTX1060 dGPU. What's interesting is that the dGPU should be directly connected to the HDMI output, so I was hoping to pass the 1060 to a Win10 guest and use an external monitor connected to the HDMI (don't know if that's possible). I'm on ArchLinux using qemu-headless 2.12.1 and libvirt 4.5.0.

The relevant IOMMU groups are:

IOMMU Group 1 00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 07)
IOMMU Group 1 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP106M [GeForce GTX 1060 Mobile] [10de:1c20] (rev a1)

and also here's my full libvirt xml:

<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
  <name>windows10</name>
  <uuid>da3372e1-96a4-4470-8131-6079e178c609</uuid>
  <memory unit='KiB'>15624192</memory>
  <currentMemory unit='KiB'>15624192</currentMemory>
  <vcpu placement='static'>4</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='4'/>
    <vcpupin vcpu='1' cpuset='5'/>
    <vcpupin vcpu='2' cpuset='6'/>
    <vcpupin vcpu='3' cpuset='7'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-q35-2.12'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/ovmf/x64/OVMF_CODE.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/windows10_VARS.fd</nvram>
    <bootmenu enable='yes'/>
    <smbios mode='host'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='5DIE45JG7EAY'/>
    </hyperv>
    <kvm>
      <hidden state='on'/>
    </kvm>
    <vmport state='off'/>
  </features>
  <cpu mode='custom' match='exact' check='none'>
    <model fallback='allow'>Skylake-Client</model>
    <topology sockets='1' cores='4' threads='1'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
    <timer name='hypervclock' present='yes'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source file='/home/kalel/workspace/VirtualMachines/windows10.img'/>
      <target dev='vda' bus='virtio'/>
      <boot order='3'/>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </disk>
    <controller type='usb' index='0' model='nec-xhci'>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x10'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x11'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0x12'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0x13'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0x14'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x4'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0x15'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x5'/>
    </controller>
    <controller type='pci' index='7' model='dmi-to-pci-bridge'>
      <model name='i82801b11-bridge'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1e' function='0x0'/>
    </controller>
    <controller type='pci' index='8' model='pci-bridge'>
      <model name='pci-bridge'/>
      <target chassisNr='8'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <interface type='network'>
      <mac address='52:54:00:c4:cb:d0'/>
      <source network='default'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0' multifunction='on'/>
    </interface>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='spice' autoport='yes'>
      <listen type='address'/>
      <gl enable='no' rendernode='/dev/dri/by-path/pci-0000:00:02.0-render'/>
    </graphics>
    <sound model='ich6'>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </sound>
    <video>
      <model type='virtio' heads='1' primary='yes'>
        <acceleration accel3d='no'/>
      </model>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x1'/>
    </video>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
      </source>
      <rom bar='off'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0' multifunction='on'/>
    </hostdev>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </memballoon>
  </devices>
  <qemu:commandline>
    <qemu:arg value='-set'/>
    <qemu:arg value='device.hostdev0.x-pci-sub-vendor-id=5218'/>
    <qemu:arg value='-set'/>
    <qemu:arg value='device.hostdev0.x-pci-sub-device-id=4525'/>
    <qemu:arg value='-set'/>
    <qemu:arg value='device.hostdev0.bus=pci.1'/>
    <qemu:env name='QEMU_AUDIO_DRV' value='pa'/>
    <qemu:env name='QEMU_PA_SAMPLES' value='4096'/>
    <qemu:env name='QEMU_AUDIO_TIMER_PERIOD' value='200'/>
    <qemu:env name='QEMU_PA_SERVER' value='/run/user/1000/pulse/native'/>
  </qemu:commandline>
</domain>

At this point I've tried a lot of different things: patching NVIDIA drivers, removing the VIRTIO primary GPU, booting with the external monitor plugged in, but I'm still getting code 43 after I install NVIDIA drivers. I've also checked the vBIOS and it seems that the one I used to patch OVMF is the right one, because it's an exact match to the one I extracted with nvflash from inside the VM. I'm probably missing something stupid at this point could you guys please help?

Thank you all for your assistance.

KenMasters20XX commented 5 years ago

@kalelkenobi @marcosscriven

I'm posting to confirm that I have a very similar configuration as @kalelkenobi . I'm using a Gigabyte Aero 15X v8, with GTX 1070 Max-Q and yet I'm stuck with Code 43.

I've tried every configuration posted in this repo, as well as in the other dGPU repo, and none of them seem to work.

In Windows, the Nvidia driver installs perfectly fine without complaint, and yet reports Code 43. I've patched my OVMF using the provided PKGBUILD up-thread. I've tried passing my vBIOS ROM separately, alongside, or not at all. I've tried with and without ROM BAR enabled. I've tried SeaBIOS and UEFI both, i440fx and Q35.

Perhaps this information can help someone figure this out but as of now, I am at a bit of a loss. What I have learned is that my particular graphics configuration has the following characteristics:

1) The card has it's own EEPROM chip, an MXIC MX25U4033E 512KB chip. 2) I can retrieve what I believe to be the complete BIOS via nvflash as well as GPU-Z; however, there is only a non-UEFI BIOS to be found. 3) I've gone so far as to dump the flash from the chip myself using an EEPROM programmer.
4) The dump is 512KB, verified against the chip, but only 169kBytes is actually used and, again, no UEFI; only the PC Type 0 BIOS; the rest is just zero'ed.
5) I've searched far and wide through the Aero 15X and the MSI GS65 BIOS update files for ANYTHING that might be an nVidia UEFI PE file and found nothing. All of this leads me to believe these cards are NOT UEFI-enabled, and they are NOT being shadowed like other Optimus cards that don't have discrete EEPROM (I could be wrong here). 6) The card shows up in lspci as a "VGA Controller."
7) This is not an MXM device, and is Optimus-enabled. 8) The GTX 1070 Max-Q controls the HDMI and mini-DP ports. If the GPU's driver is disabled those ports will not work. 9) If I attach an external display, I see the internal QXL card mirrored across the GTX card when the kernel goes into framebuffer mode during boot and I can see Ubuntu's logo and status indicator upon booting the VM (I believe this is VESA). After about 3-4 seconds the system seems to hard lock (although I have not tried to SSH in to confirm) . 10) I don't see anything from Windows, nor do I see Tianocore's logo upon boot on the external display; this only happens with the Ubuntu splash/status indicator and this is using the default 18.04 Nouveau drivers. 11) FWIW, all of the cards information shows up in GPU-Z, the BIOS dump from within the VM is exactly the same as it is from outside the VM in bare-metal Windows and from an EEPROM reader directly from the chip. So the BIOS is being passed-through successfully. The only difference is that the GPU shows no clock speed; and believe is in a D3 power-down/sleep state. AFAIK, I have no way of getting it out of this state (due to the Code 43).

Some suppositions on my part:

I don't believe this card has a UEFI 'BIOS', either on it's own discrete EEPROM or in the system firmware. That might be true of all the Max-Q model cards? My guess is that these designs are completely relying on the iGPU at boot and operate in CSM-mode only with a legacy BIOS. I don't think any of these laptops can operate with the iGPU off, nor can any of them disable the internal display or remain functional at bootup with the internal display disabled (if done through a BIOS hack).

At this point, I'm left attempting a few other alternatives, but I think I've fully explored the possibility that the VM isn't getting the correct BIOS -- as far as I can tell, it is. I've used @marcosscriven 's configuration as well as many other iterations, and yet, nothing works for me.

Next steps would be to try the ACS patch (because there is a hidden PCIe HDMI audio component at 0000:01:00.1 that I cannot passthrough).

Or.. to try to use a UEFI-enabled GTX 1070M BIOS patched OVMF (assuming compatibility with Max-Q).

Or.. try patch my own custom Pascal BIOS for the Max-Q, based on combining the 1070M UEFI-enabled BIOS with the 1070 Max-Q and then flashing that to my card (I can flash back with programmer if it fails, so no worries there) and hoping that by effectively turning the card into a UEFI-compatible card that it might help?

Any thoughts, ideas, would be greatly appreciated. Would really like to get this working and it seems I'm right very close and I'm maybe missing something trivial? I get the feeling that maybe I'm spending a lot of time on this BIOS issue and it's something completely different?

Thanks!

kalelkenobi commented 5 years ago

@KenMasters20XX thank you for your intensive testing. I believe I am in the same situation as you are. My GTX 1060 is NOT a Max-Q design, but I’ve jumped through the same hoops as you have trying to confirm that I had in fact a valid BIOS (short of using an eeprom programmer) and came to the same result. The Guest seems to be getting the right BIOS and there is no way to extract a UEFI compatible dump from the card or the BIOS updates. Tried all the same setups you did (q35, i440fx, patches OMVF, regular OVMF, etc...) with no luck. Unfortunately there’s little else I can contribute aside from confirming some of your guesses: my laptop cannot in fact operate with the iGPU or internal display disabled (I tried via unlocked BIOS). I’ve also tried using a downloaded vBIOS that seemed a close match to my own, no luck. Lately I’ve been focusing my attention on the PCI hierarchy, thinking maybe I missed something there. I hadn’t found the hidden PCI device, although I suspected it existed. Do you guys think that could be it? Maybe the HDMI audio needs to be passed on for the card to work properly. bare metal windows seems to be able to use it, even though it doesn’t show up in device manager.

KenMasters20XX commented 5 years ago

@kalelkenobi I think we're the two users in this thread so far with Pascal cards? I think perhaps everyone else is using Maxwell-based cards and that might make the difference. So far, I've not found any instances online of either an integrated or MXM-based Pascal card being successfully passed-through.

What is interesting is that MXM cards like the 1070M do in fact have a UEFI BIOS; however, the integrated cards, even though they show up as VGA Controller, and have control over the HDMI ports, do not have an associated UEFI module. My guess is these cards simply do not have UEFI-functionality by design? I'm going to retrieve my firmware's Gop Driver and take another look, but IIRC, there was no indication of an Nvidia driver there.

Now, if that's true; then perhaps that's what's causing the Code 43? If not, then there's a UEFI module that I'm simply missing...

BTW; the 'hidden' HDMI Audio device does indeed exist, I've seen it "accidentally" exposed by toggling the power-state via ACPI calls. At various times (seemingly at random) the HDMI device will show up in lspci. This is one of the reasons I'm thinking of using an ACS-patched kernel in my next series of tests, simply to isolate this as a possibility.

Lastly, I'm thinking that perhaps the ACPI tables might be a difference-maker here. I've taken a cursory look at the Aero 15X's SSDT table and I'm guessing that, perhaps like a Hackintosh, there is some incompatibility here between what's been posted and what the Nvidia driver is expecting to see for an integrated Pascal GPU.

Hard to really say with any certainty since this amounts to shooting in the dark.

kalelkenobi commented 5 years ago

@KenMasters20XX I'm no nowhere near as versed as I'm guessing you are on BIOS and ACPI inner workings, that is why I got stumped on the no-UEFI dump front and essentially gave up on that angle. This is simply way over my head, that's why I'll try and look at the HDMI audio angle. I was able to find a way too old bug about this: https://bugs.freedesktop.org/show_bug.cgi?id=75985 it seems to be a nvidia proprietary driver issue, that they simply choose to ignore. There is a workaround, described in the bug, to make HDMI audio show up reliably and the device is indeed in the same IOMMU group as my 1060:

IOMMU Group 1 00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 07)
IOMMU Group 1 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP106M [GeForce GTX 1060 Mobile] [10de:1c20] (rev a1)
IOMMU Group 1 01:00.1 Audio device [0403]: NVIDIA Corporation GP106 High Definition Audio Controller [10de:10f1] (rev a1)

I'll try passing it over to the guest and see if that makes any difference, but I'm not holding my breath. As you pointed out very few people tried this on pascal optimus laptops and I have not been able to find anyone who actually succeded.

KenMasters20XX commented 5 years ago

@kalelkenobi That's an interesting find... I can only add that I noticed my HDMI audio not working when attempting to use HDMI out in Arch. I didn't spend much time on it since it was entirely for just movie-watching. But I definitely think either getting the device to show up or using an ACS patched kernel is worth trying.

spacepluk commented 5 years ago

I have the same GPU on a Razer Blade (late 2016) and I've also been fighting with this:

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP106M [GeForce GTX 1060 Mobile] [10de:1c20] (rev ff)

I think I've tried every known workaround (hiding kvm, passing subsystem ids, PCI hierarchy, Audio device, patched OVMF, you name it...) and I'm still getting the code 43. The interesting thing about it is that in my case it worked with very little effort with both Linux and OSX guests. So at least we know it should be possible...

Before I gave up, I concluded that the only way I would get this to work is getting some insider info from nvidia or spending some serious time reverse engineering the driver :(

kalelkenobi commented 5 years ago

@spacepluk so there is no benefit in passing on the HDMI audio device? too bad :( . I really wanted to make this work. Thank you for your feedback though.

spacepluk commented 5 years ago

@kalelkenobi it didn't make a difference when I tried it :(

kalelkenobi commented 5 years ago

@spacepluk very sad. Unfortunately I lack the skill to look into this any deeper. Too bad...Oh well, on the flip side I learned a good deal and even found a workaround for HDMI audio should I ever need it :D

Thank you guys for your hard work and feedback, I'd be still pulling my hairs out trying to figure this out if it wasn't for you. I'll let you know if I got any update on this.

KenMasters20XX commented 5 years ago

@spacepluk @kalelkenobi

Had a question for you guys since you're both using Pascal mobile, non-MXM cards.

Can you load your vBIOS dump (from your board, using nvflash, GPU-Z, Windows registry, etc) into the Mobile Pascal TDP Tweaker application; and do you see any thermal limit data or just zeros when you load your ROM? You can find the application here: https://github.com/LaneLyng/MobilePascalTDPTweaker/releases

I find it interesting that, out of all the Pascal ROMs I have, the 1070 Max-Q is the only one that shows zeros. Want to confirm this isn't just an Aero 15X vBIOS issue.

Thanks!

spacepluk commented 5 years ago

@KenMasters20XX I can send you my vbios so you can play with it. You can reach me here telegram

kalelkenobi commented 5 years ago

@KenMasters20XX I don't see zeroes. Here's a screenshot bios

ghost commented 5 years ago

Patching OVMF as described by @arne-claeys also got rid of error 43 on my laptop. Side-effect: I can now access the Nvidia configuration apps (did not have to reinstall the drivers).

The machine is MUXed, has an Nvidia Quadro GPU (GK106 chipset): "HP Zbook 15 Mobile Workstation" (see https://support.hp.com/lv-en/document/c03934138 ). Its dGPU VBIOS has no UEFI support, but I could bring some through https://www.win-raid.com/t892f16-AMD-and-Nvidia-GOP-update-No-requests-DIY.html . Host is an Ubuntu 18.04, I rebuilt the patched "ovmf" package from the distro repository sources.

I didn't try anything 3D atm, both the LookingGlass server and client are setup and ready to go.

Verequies commented 5 years ago

@KenMasters20XX I've got the Aero 15 W, GTX 1060 version. Been down your road and steps back when I originally got the laptop, and then some. I have come to the conclusion that the ACPI tables need patching. However... the GPU works 100% under a Linux VM, no issues there. This is an issue with Windows initializing the GPU. I've just finished my Uni semester and plan to do some debugging during my holidays to determine why it doesn't want to initialize.

T-vK commented 5 years ago

Can anyone assist me getting this to work on Fedora?

I'm currently writing a script that automatically checks whether a notebook is GPU-passthrough compatible and if so automatically sets everything up and creates a Windows VM with the GPU passed through. My script currently only works with Fedora.

So far it works pretty well, but I get error 43 using the GTX 1060 Max-Q notebook I currently have.

Until now I had pretty good luck with Fedora, not needing a single patch or hack. But I guess this one will be necessary until there is an upstream solution.

My efforts to script the ovmf patching are looking like this so far:

## Download the package sources:
$ sudo dnf download --source edk2-ovmf

## Extract the package sources to ~/rpmbuild
$ rpm -ivh edk2-*.src.rpm

## Download the patch:
$ wget https://github.com/jscinoz/optimus-vfio-docs/files/1842587/fwcfg_patch.txt \
       -O ~/rpmbuild/SOURCES/0023-OvmfPkg-fwcfg.patch

## Add the patch to the spec file:
$ sed '/# non-upstream patches/a Patch1337: 1337-OvmfPkg-fwcfg.patch' ~/rpmbuild/SPECS/edk2.spec

## Download the GTX 1060 Max-Q vBIOS:
$ wget https://www.techpowerup.com/vgabios/205874/205874.rom \
       -O ~/rpmbuild/SOURCES/vBIOS.bin

## Download ssdt.txt
$ wget https://github.com/jscinoz/optimus-vfio-docs/files/1842788/ssdt.txt \
       -O ~/rpmbuild/SOURCES/ssdt.txt

## Download buildtable.txt
$ wget https://github.com/jscinoz/optimus-vfio-docs/files/1842710/buildtable.txt \
       -O ~/rpmbuild/SOURCES/buildtable.txt

## ?

## rebuild the package
$ rpmbuild -ba ~/rpmbuild/SPECS/edk2.spec

But I don't really know what I'm doing. I have never modified and rebuilt a source package and it appears you are supposed to only use patches for your modifications. Any ideas how this could be done?

Here is what the file structure of the extracted rpm package looks like:

tree ~/rpmbuild
/home/fedora/rpmbuild
├── SOURCES
│   ├── 0001-OvmfPkg-silence-EFI_D_VERBOSE-0x00400000-in-NvmExpre.patch
│   ├── 0002-OvmfPkg-silence-EFI_D_VERBOSE-0x00400000-in-the-DXE-.patch
│   ├── 0003-OvmfPkg-enable-DEBUG_VERBOSE.patch
│   ├── 0004-OvmfPkg-increase-max-debug-message-length-to-512.patch
│   ├── 0005-advertise-OpenSSL-on-TianoCore-splash-screen-boot-lo.patch
│   ├── 0006-OvmfPkg-QemuVideoDxe-enable-debug-messages-in-VbeShi.patch
│   ├── 0007-MdeModulePkg-TerminalDxe-add-other-text-resolutions.patch
│   ├── 0008-MdeModulePkg-TerminalDxe-set-xterm-resolution-on-mod.patch
│   ├── 0009-OvmfPkg-take-PcdResizeXterm-from-the-QEMU-command-li.patch
│   ├── 0010-ArmVirtPkg-QemuFwCfgLib-allow-UEFI_DRIVER-client-mod.patch
│   ├── 0011-ArmVirtPkg-take-PcdResizeXterm-from-the-QEMU-command.patch
│   ├── 0012-OvmfPkg-allow-exclusion-of-the-shell-from-the-firmwa.patch
│   ├── 0013-OvmfPkg-EnrollDefaultKeys-application-for-enrolling-.patch
│   ├── 0014-ArmPlatformPkg-introduce-fixed-PCD-for-early-hello-m.patch
│   ├── 0015-ArmPlatformPkg-PrePeiCore-write-early-hello-message-.patch
│   ├── 0016-ArmVirtPkg-set-early-hello-message-RH-only.patch
│   ├── 0017-BaseTools-footer.makefile-expand-BUILD_CFLAGS-last-f.patch
│   ├── 0018-BaseTools-header.makefile-remove-c-from-BUILD_CFLAGS.patch
│   ├── 0019-BaseTools-Source-C-split-O2-to-BUILD_OPTFLAGS.patch
│   ├── 0020-BaseTools-Source-C-take-EXTRA_OPTFLAGS-from-the-call.patch
│   ├── 0021-BaseTools-Source-C-take-EXTRA_LDFLAGS-from-the-calle.patch
│   ├── 0022-BaseTools-VfrCompile-honor-EXTRA_LDFLAGS.patch
│   ├── 0099-Tweak-the-tools_def-to-support-cross-compiling.patch
│   ├── build-iso.sh
│   ├── edk2-20180815-cb5f4f45ce.tar.xz
│   ├── hobble-openssl
│   ├── openssl-1.1.0-bio-fd-preserve-nl.patch
│   ├── openssl-1.1.0-cc-reqs.patch
│   ├── openssl-1.1.0-disable-ssl3.patch
│   ├── openssl-1.1.0h-hobbled.tar.xz
│   ├── openssl-1.1.0-issuer-hash.patch
│   ├── openssl-patch-to-tarball.sh
│   ├── ovmf-whitepaper-c770f8c.txt
│   ├── qemu-ovmf-secureboot-1.1.3.tar.gz
│   └── update-tarball.sh
└── SPECS
    └── edk2.spec

2 directories, 36 files

The actual relevant source code seems to be in this tar.gz file which I thing you shouldn't modify directly. I think you're supposed to use patches, but I really don't know.

## Show where QemuFwCfgAcpi.c is in the edk2-*.tar.xz
$ tar -tf ~/rpmbuild/SOURCES/edk2-*.tar.xz | grep /QemuFwCfgAcpi.c
tianocore-edk2-cb5f4f45ce/OvmfPkg/AcpiPlatformDxe/QemuFwCfgAcpi.c
T-vK commented 5 years ago

I found two projects that could be of major interest for us:

This project is about patching the NVIDIA driver installer within Windows to bypass error 43: https://github.com/sk1080/nvidia-kvm-patcher

This project is about patching your vBIOS rom in order to get rid of error 43. Not sure what exactly it does: https://github.com/Matoking/NVIDIA-vBIOS-VFIO-Patcher

Unfortunately I ran into issues with both of them. Until I managed to resolve those, maybe you have more luck. If you do, please report back.

KenMasters20XX commented 5 years ago

I found two projects that could be of major interest for us:

This project is about patching the NVIDIA driver installer within Windows to bypass error 43: https://github.com/sk1080/nvidia-kvm-patcher

This project is about patching your vBIOS rom in order to get rid of error 43. Not sure what exactly it does: https://github.com/Matoking/NVIDIA-vBIOS-VFIO-Patcher

Unfortunately I ran into issues with both of them. Until I managed to resolve those, maybe you have more luck. If you do, please report back.

I've tried both methods.

The first method (patching the driver) simply didn't work for me. I thought it sounded quite promising, but I also think there is a more fundamental problem with mobile pascal cards, particularly the Max-Q series that show up as "VGA Controllers" in Linux (similar to MXM cards).

The second method failed for me because the NPDE markers in the vBIOS ROM don't pass the patcher's sanity checks. I've poured over quite a few vBIOS at this point and I can say with certainty that (1) I have a complete copy of the vBIOS for my system since I've pulled using an EEPROM reader from the chip, (2) it does not contain an EFI ROM section and (3) there is not a uniform "Pascal" vBIOS - meaning, some cards have EFI sections and others don't, the headers don't appear to be uniform, and as far as I can tell there seem to be two different layouts with the mobile Max-Q's having a very different layout (missing sections) than say a 1060M MXM or a desktop 1080.

Now, I know my desktop PC's 1080 doesn't have an EFI ROM either, but Nvidia offers a compatible ROM that does - so flashing is simple. For a notebook though, I'm not sure if matters for one, and secondly, I'm not sure as to why it's actually failing at all.

I think it would be curious to see if anyone can get Linux running using Nouveau drivers, in a guest VM, with graphics acceleration on the dGPU. To me, that should be a much easier to diagnose problem.

I've tried doing it myself and I've gotten only as far as the Ubuntu boot-up logo on a secondary display until the kernel panics and the guest crashes. So passthrough is technically working up to the point that the driver seems to want to take control of the device beyond just operating a basic VESA framebuffer.

To give you an idea of what that looks like for me: I'm using an Aero 15X (1070 Max-Q) and I had an external display connected over mini-DP. When firing up the guest, the external display powers on, syncs and shows the Ubuntu (guest) boot screen animation. This lasts for about 3-5 seconds of animation until it crashes.

Considering Intel will not (or is not) supporting Coffee Lake chips for GVT-g, I'm probably going to start working on a solution to this again for us Pascal users.

T-vK commented 5 years ago

Very interesting. Thanks for sharing. Have you tried using a recent kernel in your guest vm? It might also be worth taking a closer look at the kernel panic. Maybe using a serial device or by dumping it. Maybe if you're lucky there may have been enough time for stuff to get written into your syslog.

If I can find the time, I'll try to do it with a Fedora guest and see if it makes any difference. But I'm really short on time. I have to send my notebook back in a week if I don't want to keep it. For now my priorities are getting rid of error 43 in any way no matter how hacky so that I know it can be done on my device.

From what other people reported the GPU works fine in a Linux guest vm on the Aero 15X. Maybe @Verequies can share some details like which distro and kernel version he used and if he actually used nouveau drivers and tested GPU acceleration. Kernel parameters would also be interesting.

Verequies commented 5 years ago

Yeah can provide some details. Also those patches never worked for me either. As far as I can tell, the VBIOS is already being properly read under the VM. I've compared a dump in the VM to a dump on the host using nvflash.

As for the Linux tests, I use Arch Linux, latest kernel and all. I don't think it will work with the Nouveau drivers, but I can give it a shot. Using the NVIDIA drivers though it works 100% like I said earlier and I've played some games on it. Also the GPU doesn't work under a macOS VM. I'm currently dumping and comparing all the info I can from a native Windows install to the VM install, and so far I think it might be the fact that the PCIe Root Port is coming back as hot-pluggable, whereas natively, its not. Tried editing some code for QEMU but can't get rid of that feature. Might need some help on that. Oh I've also been comparing stuff to my Precision M4800 which I have GPU Passthrough fully working.

Anyhow, gimme an hour or two and I'll try getting it working with the Nouveau drivers.

Verequies commented 5 years ago

Alright, had some other stuff to figure out. But yeah, nouveau works just as well as the NVIDIA proprietary driver. Didn't even set up the config, just blacklisted the other drivers, and restarted the display manager. Obviously the nouveau driver has graphical performance issues compared to the NVIDIA proprietary driver.

Again, this is using an up to date Arch Linux guest, with kernel 4.19.5.

EDIT: And yes, I've tested playing games like 'Human: Fall Flat' on both drivers. So GPU acceleration does indeed work.

EDIT2: This required no special configuration to get working. Not even any xorg.confs. I might mention that I went did try a crazy idea the other day. I attempted to get it working in a nested VM with PCI Passthrough. So I ran a Windows VM in the Linux VM to see if that'd do it, but as I thought, it did not work and got the blasted error 43 again.

T-vK commented 5 years ago

@Verequies Thank you very much for your testing this. The nested VM idea is pretty cool. I wasn't even aware that's possible, too bad it doesn't work. :/

Verequies commented 5 years ago

No worries, I just tested the latest Ubuntu 18.04 iso just then and it works fine as well. So I'm going to say theres no issues running the GPU under Linux. At least none I can see.

If anyone wants screenshots for proof, I'm happy to provide. Now back to work on trying to get it working on Windows... would be good if someone could help try to figure out how to disable PCIe Hotplug and Hotplug Surprise capabilities. If you use HWiNFO64 you can see the features enabled on the PCIe Root Ports. When booting native, these features are actually disabled, so I reckon Windows is trying to setup those features on the GPU but is failing due to the GPU not supporting them. If this is the case, Linux probably has a workaround, or just continues if it can't set it up.

T-vK commented 5 years ago

That's good to hear. I don't know anything about PCIe hotplugging unfortunately. I do have an eGPU enclosure with a GTX 970 that uses Thunderbolt 3 which is kind of related to PCIe and is probably hotpluggable I would assume. I've never tested that enclosure, but I could try to pass it through to a windows vm and see if that works.

Verequies commented 5 years ago

Yeah if you could try that, that'd be sweet. I'm pretty sure it should work, however I don't have any eGPU setup here so can't test.