strongtz / i915-sriov-dkms

dkms module of Linux i915 driver with SR-IOV support
1.06k stars 127 forks source link

Unable to build under pve kernel 5.19.17-1-pve x86_64 #17

Closed 1582130940 closed 1 year ago

1582130940 commented 1 year ago

I deleted PXP in the Makefile ahead of time, and additionally added intel-gtt.h (solved the previous error).

In file included from ./arch/x86/include/asm/bug.h:87, from ./include/linux/bug.h:5, from ./arch/x86/include/asm/paravirt.h:15, from ./arch/x86/include/asm/irqflags.h:63, from ./include/linux/irqflags.h:16, from ./include/linux/rcupdate.h:26, from ./include/linux/rculist.h:11, from ./include/linux/sched/signal.h:5, from ./include/linux/oom.h:6, from /var/lib/dkms/i915-sriov-dkms/5.15.49/build/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c:7: /var/lib/dkms/i915-sriov-dkms/5.15.49/build/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c: In function ‘i915_gem_driver_register__shrinker’: /var/lib/dkms/i915-sriov-dkms/5.15.49/build/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c:429:26: error: too many arguments to function ‘register_shrinker’ 429 | drm_WARN_ON(&i915->drm, register_shrinker(&i915->mm.shrinker, "drm-i915_gem")); | ^~~~~ ./include/asm-generic/bug.h:131:25: note: in definition of macro ‘WARN’ 131 | int __ret_warn_on = !!(condition); \ | ^~~~~ ./include/drm/drm_print.h:593:2: note: in expansion of macro ‘drm_WARN’ 593 | drm_WARN((drm), (x), "%s", \ | ^~~~ /var/lib/dkms/i915-sriov-dkms/5.15.49/build/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c:429:2: note: in expansion of macro ‘drm_WARN_ON’ 429 | drm_WARN_ON(&i915->drm, register_shrinker(&i915->mm.shrinker, "drm-i915_gem")); | ^~~ In file included from ./include/linux/mm.h:20, from ./include/linux/oom.h:11, from /var/lib/dkms/i915-sriov-dkms/5.15.49/build/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c:7: ./include/linux/shrinker.h:93:12: note: declared here 93 | extern int register_shrinker(struct shrinker *shrinker); | ^~~~~ CC [M] /var/lib/dkms/i915-sriov-dkms/5.15.49/build/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.o make[1]: *** [scripts/Makefile.build:257: /var/lib/dkms/i915-sriov-dkms/5.15.49/build/drivers/gpu/drm/i915/gem/i915_gem_shrinker.o] Error 1

1582130940 commented 1 year ago

The problem I'm encountering now is that when I update the upstream to 5.15.71 (it can't be compiled anyway, from https://github.com/intel/linux-intel-lts), it can be compiled, but it can't be compiled into a module, and an error is reported

ERROR: modpost: "intel_gmch_enable_gtt" [/var/lib/dkms/i915-sriov-dkms/5.15.71/build/i915.ko] undefined! ERROR: modpost: "intel_gvt_resume" [/var/lib/dkms/i915-sriov-dkms/5.15.71/build/i915.ko] undefined! ERROR: modpost: "drm_dp_mst_atomic_setup_commit" [/var/lib/dkms/i915-sriov-dkms/5.15.71/build/i915.ko] undefined! ERROR: modpost: "drm_dp_mst_root_conn_atomic_check" [/var/lib/dkms/i915-sriov-dkms/5.15.71/build/i915.ko] undefined! ERROR: modpost: "intel_gvt_driver_remove" [/var/lib/dkms/i915-sriov-dkms/5.15.71/build/i915.ko] undefined! ERROR: modpost: "intel_gmch_gtt_get" [/var/lib/dkms/i915-sriov-dkms/5.15.71/build/i915.ko] undefined! ERROR: modpost: "drm_dp_add_payload_part2" [/var/lib/dkms/i915-sriov-dkms/5.15.71/build/i915.ko] undefined! ERROR: modpost: "drm_dp_add_payload_part1" [/var/lib/dkms/i915-sriov-dkms/5.15.71/build/i915.ko] undefined! ERROR: modpost: "intel_gmch_gtt_insert_page" [/var/lib/dkms/i915-sriov-dkms/5.15.71/build/i915.ko] undefined! ERROR: modpost: "drm_atomic_get_new_mst_topology_state" [/var/lib/dkms/i915-sriov-dkms/5.15.71/build/i915.ko] undefined! WARNING: modpost: suppressed 12 unresolved symbol warnings because there were too many) make[1]: *** [scripts/Makefile.modpost:128: /var/lib/dkms/i915-sriov-dkms/5.15.71/build/Module.symvers] Error 1

My project is here https://github.com/1582130940/i915-sriov-dkms I don't know if it's because I didn't enable some necessary DCONFIG to cause this problem, or Debian/PVE didn't enable something

zhtengw commented 1 year ago

i915-sriov can be built with 5.19.17-1-pve. You may checkout a commit before 856f86b ( for example 4d5e511). Then install this module with dkms install -m i915-sriov -v dkms -k 5.19.17-1-pve. It's not necessary to modify the Makefile or the source directories.

$ git checkout 4d5e511
HEAD           master         origin/HEAD    origin/master
bb44262  -- [HEAD]    Merge pull request #11 from x0wllaar/kernel6 (8 weeks ago)
856f86b  -- [856f86b] Module now can be compiled for kernel 6.0.2 (8 weeks ago)
4d5e511  -- [HEAD^]   Update README.md (9 weeks ago)
de763d3  -- [HEAD^^]  Update README (3 months ago)
ddcd0c1  -- [HEAD~3]  No need for blacklist (3 months ago)
e7ca4f6  -- [HEAD~4]  Add README.md (3 months ago)
01bec0e  -- [HEAD~5]  Fix code (3 months ago)
1582130940 commented 1 year ago

i915-sriov can be built with 5.19.17-1-pve. You may checkout a commit before 856f86b ( for example 4d5e511). Then install this module with dkms install -m i915-sriov -v dkms -k 5.19.17-1-pve. It's not necessary to modify the Makefile or the source directories.

$ git checkout 4d5e511
HEAD           master         origin/HEAD    origin/master
bb44262  -- [HEAD]    Merge pull request #11 from x0wllaar/kernel6 (8 weeks ago)
856f86b  -- [856f86b] Module now can be compiled for kernel 6.0.2 (8 weeks ago)
4d5e511  -- [HEAD^]   Update README.md (9 weeks ago)
de763d3  -- [HEAD^^]  Update README (3 months ago)
ddcd0c1  -- [HEAD~3]  No need for blacklist (3 months ago)
e7ca4f6  -- [HEAD~4]  Add README.md (3 months ago)
01bec0e  -- [HEAD~5]  Fix code (3 months ago)

Then can I ask why the new source code of intel lts cannot be applied on 5.19?

zhtengw commented 1 year ago

Then can I ask why the new source code of intel lts cannot be applied on 5.19?

You should copy the source codes from the right branch of upstream, in which the SR-IOV patches are included. You could try codes in the adl-linux tag.

1582130940 commented 1 year ago

Then can I ask why the new source code of intel lts cannot be applied on 5.19?

You should copy the source codes from the right branch of upstream, in which the SR-IOV patches are included. You could try codes in the adl-linux tag.

Have you tried it? If you copy the file of this branch, the first step will be because the pciid of pve 5.19 is too old and an error will be reported: error: implicit declaration of function 'INTEL_ATS_M150_IDS';. This is no different from directly copying the intel linux/5.15 branch (same support for SR-IOV).

1582130940 commented 1 year ago

Then can I ask why the new source code of intel lts cannot be applied on 5.19?

You should copy the source codes from the right branch of upstream, in which the SR-IOV patches are included. You could try codes in the adl-linux tag.

Even if the missing file is solved, the error is still reported in the end MODPOST /var/lib/dkms/i915-sriov-dkms/5.15.71/build/Module.symvers ERROR: modpost: "intel_gvt_resume" [/var/lib/dkms/i915-sriov-dkms/5.15.71/build/i915.ko] undefined! ERROR: modpost: "intel_ggtt_gmch_enable_hw" [/var/lib/dkms/i915-sriov-dkms/5.15.71/build/i915.ko] undefined! ERROR: modpost: "intel_modeset_verify_crtc" [/var/lib/dkms/i915-sriov-dkms/5.15.71/build/i915.ko] undefined! ERROR: modpost: "drm_dp_mst_atomic_setup_commit" [/var/lib/dkms/i915-sriov-dkms/5.15.71/build/i915.ko] undefined! ERROR: modpost: "drm_dp_mst_root_conn_atomic_check" [/var/lib/dkms/i915-sriov-dkms/5.15.71/build/i915.ko] undefined! ERROR: modpost: "intel_gvt_driver_remove" [/var/lib/dkms/i915-sriov-dkms/5.15.71/build/i915.ko] undefined! ERROR: modpost: "intel_ggtt_gmch_probe" [/var/lib/dkms/i915-sriov-dkms/5.15.71/build/i915.ko] undefined! ERROR: modpost: "drm_dp_add_payload_part2" [/var/lib/dkms/i915-sriov-dkms/5.15.71/build/i915.ko] undefined! ERROR: modpost: "drm_dp_add_payload_part1" [/var/lib/dkms/i915-sriov-dkms/5.15.71/build/i915.ko] undefined! ERROR: modpost: "drm_atomic_get_new_mst_topology_state" [/var/lib/dkms/i915-sriov-dkms/5.15.71/build/i915.ko] undefined!

1582130940 commented 1 year ago

Then can I ask why the new source code of intel lts cannot be applied on 5.19?

You should copy the source codes from the right branch of upstream, in which the SR-IOV patches are included. You could try codes in the adl-linux tag.

I have submitted commit: https://github.com/1582130940/i915-sriov-dkms/commit/91068d0c3aa8aadf7d017c8564abe2c59c97e318, you can check it out.

zhtengw commented 1 year ago

Sorry, my fault. These functions were merged to mainline since linux-6.0. To build with pve-kernel-5.19, the latest i915-sriov version I tested is 5.15.49-adl-linux-221011T100001Z.

1582130940 commented 1 year ago

Sorry, my fault. These functions were merged to mainline since linux-6.0. To build with pve-kernel-5.19, the latest i915-sriov version I tested is 5.15.49-adl-linux-221011T100001Z.

After supplementing the missing trace file, it can be compiled, and some changes still need to be kept and cannot be copied, otherwise problems will occur. https://github.com/1582130940/i915-sriov-dkms/commit/b32cbf4c1bb58fe639e72a2b5d74af1fe68c8e16

zhtengw commented 1 year ago

After supplementing the missing trace file, it can be compiled, and some changes still need to be kept and cannot be copied, otherwise problems will occur. 1582130940@b32cbf4

Yes, then you will find it's not very different with files in commit 4d5e511.

zhtengw commented 1 year ago

Then can I ask why the new source code of intel lts cannot be applied on 5.19?

If you want to try the latest sriov source code in intel lts, you may try to build with kernel 6.1. Because many of the required features are only merged to mainline since linux-6.1. I have tested with kernel 6.1 on PVE and Gentoo.

1582130940 commented 1 year ago

After supplementing the missing trace file, it can be compiled, and some changes still need to be kept and cannot be copied, otherwise problems will occur. 1582130940@b32cbf4

Yes, then you will find it's not very different with files in commit 4d5e511.

Yes, just one more RC6 update. Of course, after I compiled it, it couldn't work normally. I don't know why, and the vf couldn't be created normally (7 PCI devices without models will appear, but they have no effect).

1582130940 commented 1 year ago

Then can I ask why the new source code of intel lts cannot be applied on 5.19?

If you want to try the latest sriov source code in intel lts, you may try to build with kernel 6.1. Because many of the required features are only merged to mainline since linux-6.1. I have tested with kernel 6.1 on PVE and Gentoo.

Received, I will try it on 6.1 if possible, thank you.

1582130940 commented 1 year ago

Then can I ask why the new source code of intel lts cannot be applied on 5.19?

If you want to try the latest sriov source code in intel lts, you may try to build with kernel 6.1. Because many of the required features are only merged to mainline since linux-6.1. I have tested with kernel 6.1 on PVE and Gentoo.

I tested the 6.1 kernel, and the compilation passed, but the vf is still not normally allocated after restarting. In this case anyway, even if a virtual device is generated, its device is unusable

root@pve:~# dmesg | grep vf [ 2.829773] RAPL PMU: API unit is 2^-32 Joules, 3 fixed counters, 655360 ms ovfl timer [ 23.833760] vfio-pci 0000:07:00.0: vfio_ecap_init: hiding ecap 0x1e@0x20c [ 24.889788] vfio-pci 0000:08:00.0: vfio_ecap_init: hiding ecap 0x1e@0x20c [ 25.945758] vfio-pci 0000:0a:00.0: vfio_ecap_init: hiding ecap 0x1e@0x20c [ 27.001740] vfio-pci 0000:0b:00.0: vfio_ecap_init: hiding ecap 0x1e@0x20c [ 28.093689] vfio-pci 0000:0f:00.0: vfio_ecap_init: hiding ecap 0x1e@0x20c [ 29.177682] vfio-pci 0000:10:00.0: vfio_ecap_init: hiding ecap 0x1e@0x20c [ 30.233643] vfio-pci 0000:12:00.0: vfio_ecap_init: hiding ecap 0x1e@0x20c [ 31.289622] vfio-pci 0000:13:00.0: vfio_ecap_init: hiding ecap 0x1e@0x20c

root@pve:~# dmesg | grep -i guc [ 3.183146] Setting dangerous option enable_guc - tainting kernel [ 3.279647] i915 0000:00:02.0: [drm] GuC error state capture buffer maybe too small: 2097152 < 2163708 (min = 721236) [ 3.281917] i915 0000:00:02.0: [drm] GuC firmware i915/tgl_guc_70.bin version 70.5.1 [ 3.284366] i915 0000:00:02.0: [drm] GuC submission enabled [ 3.284368] i915 0000:00:02.0: [drm] GuC SLPC enabled [ 3.284682] i915 0000:00:02.0: [drm] GuC RC: enabled [ 5.142585] i915 0000:00:02.1: GuC interface version 0.1.0.0 [ 5.143639] i915 0000:00:02.1: GuC interface version 0.1.0.0 [ 5.143792] i915 0000:00:02.1: GuC firmware PRELOADED version 1.0 submission:SR-IOV VF [ 5.146466] i915 0000:00:02.2: GuC interface version 0.1.0.0 [ 5.147296] i915 0000:00:02.2: GuC interface version 0.1.0.0 [ 5.147453] i915 0000:00:02.2: GuC firmware PRELOADED version 1.0 submission:SR-IOV VF [ 5.149581] i915 0000:00:02.3: GuC interface version 0.1.0.0 [ 5.150268] i915 0000:00:02.3: GuC interface version 0.1.0.0 [ 5.150412] i915 0000:00:02.3: GuC firmware PRELOADED version 1.0 submission:SR-IOV VF [ 5.152621] i915 0000:00:02.4: GuC interface version 0.1.0.0 [ 5.153366] i915 0000:00:02.4: GuC interface version 0.1.0.0 [ 5.153518] i915 0000:00:02.4: GuC firmware PRELOADED version 1.0 submission:SR-IOV VF [ 5.155645] i915 0000:00:02.5: GuC interface version 0.1.0.0 [ 5.156260] i915 0000:00:02.5: GuC interface version 0.1.0.0 [ 5.156384] i915 0000:00:02.5: GuC firmware PRELOADED version 1.0 submission:SR-IOV VF [ 5.158145] i915 0000:00:02.6: GuC interface version 0.1.0.0 [ 5.158651] i915 0000:00:02.6: GuC interface version 0.1.0.0 [ 5.158774] i915 0000:00:02.6: GuC firmware PRELOADED version 1.0 submission:SR-IOV VF [ 5.160579] i915 0000:00:02.7: GuC interface version 0.1.0.0 [ 5.161191] i915 0000:00:02.7: GuC interface version 0.1.0.0 [ 5.161317] i915 0000:00:02.7: GuC firmware PRELOADED version 1.0 submission:SR-IOV VF

zhtengw commented 1 year ago

I tested the 6.1 kernel, and the compilation passed, but the vf is still not normally allocated after restarting. In this case anyway, even if a virtual device is generated, its device is unusable

Does lspci list the VF devices?

1582130940 commented 1 year ago

I tested the 6.1 kernel, and the compilation passed, but the vf is still not normally allocated after restarting. In this case anyway, even if a virtual device is generated, its device is unusable

Does lspci list the VF devices?

lspci has 7 virtual devices, but those devices have no effect after being mounted to the virtual machine, please refer to 094256yisqs92f0er592wc

zhtengw commented 1 year ago

Looks like it works. Then you should do more in VM guests.

For Linux Guest

  1. Passthrough a VF device to the Guest, for example 0000:00:02.1.
  2. Build this module for the guest Linux kernel.
  3. Pass "intel_iommu=on i915.enable_guc=3" to the guest kernel command line.
  4. Re-generate initramfs and reboot

For Windows Guest

  1. Passthrough a VF device to the Guest, for example 0000:00:02.2, and make it a Primary GPU
  2. Install the latest Intel Graphics driver
  3. Reboot and enjoy.

Just FYI:

1582130940 commented 1 year ago

Looks like it works. Then you should do more in VM guests.

For Linux Guest

  1. Passthrough a VF device to the Guest, for example 0000:00:02.1.
  2. Build this module for the guest Linux kernel.
  3. Pass "intel_iommu=on i915.enable_guc=3" to the guest kernel command line.
  4. Re-generate initramfs and reboot

For Windows Guest

  1. Passthrough a VF device to the Guest, for example 0000:00:02.2, and make it a Primary GPU
  2. Install the latest Intel Graphics driver
  3. Reboot and enjoy.

Just FYI:

  • Linux Guest 2
  • Windows Guest Snipaste_2022-12-16_10-47-48

No, it doesn't work, anyway, in the example picture where the device name is not displayed, even if the device is passed through to the virtual machine, vainfo will not show that there is an iHD device, ll dev/dri does not exist.

zhtengw commented 1 year ago

No, it doesn't work, anyway, in the example picture where the device name is not displayed, even if the device is passed through to the virtual machine, vainfo will not show that there is an iHD device, ll dev/dri does not exist.

In my case, the device name is not displayed too. It doesn't matter. 1

You should install this module for the guest linux kernel either, and pass "intel_iommu=on i915.enable_guc=3" to the guest kernel command line.

1582130940 commented 1 year ago

No, it doesn't work, anyway, in the example picture where the device name is not displayed, even if the device is passed through to the virtual machine, vainfo will not show that there is an iHD device, ll dev/dri does not exist.

In my case, the device name is not displayed too. It doesn't matter. 1

You should install this module for the guest linux kernel either, and pass "intel_iommu=on i915.enable_guc=3" to the guest kernel command line.

Thanks, I'll try it. But unfortunately, Ubuntu's official mainline kernel has failed to compile since 6.0.11, maybe I can only try 5.15.71 with the 6.0 kernel or 5.15.49 with the 5.19 kernel After testing, 5.15.49 and Ubuntu 5.19.17 can be compiled, 6.0 cannot be used, and Ubuntu 6.1 kernel is required