strongtz / i915-sriov-dkms

dkms module of Linux i915 driver with SR-IOV support
769 stars 89 forks source link

Does not compile against Kernel 6.8+ #169

Open ich777 opened 1 month ago

ich777 commented 1 month ago

As a follow up to #168, does not compile against Kernel 6.8+

/usr/src/i915-sriov-dkms/drivers/gpu/drm/i915/i915_memcpy.c:36:8: error: type defaults to 'int' in declaration of 'DEFINE_STATIC_KEY_FALSE' [-Werror=implicit-int]
   36 | static DEFINE_STATIC_KEY_FALSE(has_movntdqa);
      |        ^~~~~~~~~~~~~~~~~~~~~~~
/usr/src/i915-sriov-dkms/drivers/gpu/drm/i915/i915_memcpy.c:36:1: warning: parameter names (without types) in function declaration
   36 | static DEFINE_STATIC_KEY_FALSE(has_movntdqa);
      | ^~~~~~
/usr/src/i915-sriov-dkms/drivers/gpu/drm/i915/i915_memcpy.c: In function 'i915_memcpy_from_wc':
/usr/src/i915-sriov-dkms/drivers/gpu/drm/i915/i915_memcpy.c:118:13: error: implicit declaration of function 'static_branch_likely' [-Werror=implicit-function-declaration]
  118 |         if (static_branch_likely(&has_movntdqa)) {
      |             ^~~~~~~~~~~~~~~~~~~~
  CC [M]  /usr/src/i915-sriov-dkms/drivers/gpu/drm/i915/i915_sw_fence_work.o
/usr/src/i915-sriov-dkms/drivers/gpu/drm/i915/i915_memcpy.c:118:35: error: 'has_movntdqa' undeclared (first use in this function)
  118 |         if (static_branch_likely(&has_movntdqa)) {
      |                                   ^~~~~~~~~~~~
/usr/src/i915-sriov-dkms/drivers/gpu/drm/i915/i915_memcpy.c:118:35: note: each undeclared identifier is reported only once for each function it appears in
/usr/src/i915-sriov-dkms/drivers/gpu/drm/i915/i915_memcpy.c: In function 'i915_unaligned_memcpy_from_wc':
/usr/src/i915-sriov-dkms/drivers/gpu/drm/i915/i915_memcpy.c:148:17: error: implicit declaration of function 'memcpy' [-Werror=implicit-function-declaration]
  148 |                 memcpy(dst, src, x);
      |                 ^~~~~~
/usr/src/i915-sriov-dkms/drivers/gpu/drm/i915/i915_memcpy.c:29:1: note: include '<string.h>' or provide a declaration of 'memcpy'
   28 | #include "i915_memcpy.h"
  +++ |+#include <string.h>
   29 | 
/usr/src/i915-sriov-dkms/drivers/gpu/drm/i915/i915_memcpy.c:148:17: warning: incompatible implicit declaration of built-in function 'memcpy' [-Wbuiltin-declaration-mismatch]
  148 |                 memcpy(dst, src, x);
      |                 ^~~~~~
/usr/src/i915-sriov-dkms/drivers/gpu/drm/i915/i915_memcpy.c:148:17: note: include '<string.h>' or provide a declaration of 'memcpy'
  CC [M]  /usr/src/i915-sriov-dkms/drivers/gpu/drm/i915/i915_syncmap.o
/usr/src/i915-sriov-dkms/drivers/gpu/drm/i915/i915_memcpy.c: In function 'i915_memcpy_init_early':
/usr/src/i915-sriov-dkms/drivers/gpu/drm/i915/i915_memcpy.c:165:13: error: implicit declaration of function 'static_cpu_has' [-Werror=implicit-function-declaration]
  165 |         if (static_cpu_has(X86_FEATURE_XMM4_1) &&
      |             ^~~~~~~~~~~~~~
/usr/src/i915-sriov-dkms/drivers/gpu/drm/i915/i915_memcpy.c:166:14: error: implicit declaration of function 'boot_cpu_has' [-Werror=implicit-function-declaration]
  166 |             !boot_cpu_has(X86_FEATURE_HYPERVISOR))
      |              ^~~~~~~~~~~~
/usr/src/i915-sriov-dkms/drivers/gpu/drm/i915/i915_memcpy.c:167:17: error: implicit declaration of function 'static_branch_enable' [-Werror=implicit-function-declaration]
  167 |                 static_branch_enable(&has_movntdqa);
      |                 ^~~~~~~~~~~~~~~~~~~~
  CC [M]  /usr/src/i915-sriov-dkms/drivers/gpu/drm/i915/i915_user_extensions.o
/usr/src/i915-sriov-dkms/drivers/gpu/drm/i915/i915_memcpy.c:167:39: error: 'has_movntdqa' undeclared (first use in this function)
  167 |                 static_branch_enable(&has_movntdqa);
      |                                       ^~~~~~~~~~~~
mm2293 commented 1 month ago

+1

Can't use Kernel 6.5, as I need the driver support for Meteor Lake

illnesse commented 1 month ago

+1

i have it running perfectly on a pve server but would be really great to be able to use this for local vms with 6.9.2+

BenSchweikert commented 3 weeks ago

Hello guys, what's your strategy at the moment because with Kernels > 6.7 it is not building and strengtz does not need/maintain it any more. Do you put your "apps" from VM to Host or do you stay forever at an old kernel version or what do you plan to do?

Best regards Ben

ich777 commented 3 weeks ago

@BenSchweikert this comment maybe won't help you much but I personally also don't use this driver but I compile the driver packages for Unraid so that Unraid users can easily make use of it.

My current recommendation is that users stay on the latest stable Unraid version which currently users Kernel 6.1.79 and don't upgrade to the upcoming Unraid version (which will user Kernel 6.8+) if they want to still be able to use this driver.

Hopefully someone familiar with this repository will take a look at it and issue a PR.

However Intel is already working on a implementation upstream to the Kernel which should be very similar to this repository but they delay it from Kernel release to Kernel release, you can read more about that here. I hope the official support for i915 SRIOV will be released soon...

haohetao commented 1 week ago

+1

bolzerrr commented 1 week ago

I did some bugfixing for 6.8.4-2 https://github.com/bolzerrr/i915-sriov-dkms/tree/fix/6.8.4-2 I am not a kernel dev and changes are mostly done using chatgpt. However for me it is building and also working (e.g. on a windows 11 VM https://browser.geekbench.com/v6/compute/compare/2342702?baseline=2376633), but i get some strange messages:

[   34.276185] i915 0000:00:02.0: Running in SR-IOV PF mode
[   34.277260] i915 0000:00:02.0: [drm] VT-d active for gfx access
[   34.281402] Console: switching to colour dummy device 80x25
[   34.285643] i915 0000:00:02.0: vgaarb: deactivate vga console
[   34.285730] i915 0000:00:02.0: [drm] Using Transparent Hugepages
[   34.286264] intel_tcc_cooling: TCC Offset locked
[   34.286607] i915 0000:00:02.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[   34.287690] mei_hdcp 0000:00:16.0-b638ab7e-94e2-4ea2-a552-d1c54b627f04: bound 0000:00:02.0 (ops i915_hdcp_ops [i915])
[   34.296334] i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/adlp_dmc.bin (v2.20)
[   34.300006] intel_rapl_msr: PL4 support detected.
[   34.300070] intel_rapl_common: Found RAPL domain package
[   34.300075] intel_rapl_common: Found RAPL domain core
[   34.300078] intel_rapl_common: Found RAPL domain uncore
[   34.313163] i915 0000:00:02.0: [drm] GT0: GuC firmware i915/tgl_guc_70.bin version 70.20.0
[   34.313177] i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
[   34.318143] i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all workloads!
[   34.318579] i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
[   34.318585] i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
[   34.318930] i915 0000:00:02.0: [drm] GuC RC: enabled

All this looks fine but the created devices are getting an error:

[   54.245331] pci 0000:00:02.1: [8086:46d1] type 00 class 0x030000 PCIe Root Complex Integrated Endpoint
[   54.245898] pci 0000:00:02.1: DMAR: Skip IOMMU disabling for graphics
[   54.246526] pci 0000:00:02.1: Adding to iommu group 13
[   54.247127] pci 0000:00:02.1: vgaarb: bridge control possible
[   54.247713] pci 0000:00:02.1: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[   54.248309] i915 0000:00:02.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=io+mem
[   54.248982] i915 0000:00:02.1: enabling device (0000 -> 0002)
[   54.249622] i915 0000:00:02.1: Running in SR-IOV VF mode
[   54.250790] i915 0000:00:02.1: [drm] *ERROR* GT0: IOV: Unable to confirm version 1.9 (0000000000000000) 70 20
[   54.251562] i915 0000:00:02.1: [drm] *ERROR* GT0: IOV: Found interface version 0.1.9.0
[   54.252761] i915 0000:00:02.1: [drm] VT-d active for gfx access
[   54.253396] i915 0000:00:02.1: [drm] Using Transparent Hugepages
[   54.254573] i915 0000:00:02.1: [drm] *ERROR* GT0: IOV: Unable to confirm version 1.9 (0000000000000000) 70 20
[   54.255307] i915 0000:00:02.1: [drm] *ERROR* GT0: IOV: Found interface version 0.1.9.0
[   54.256411] i915 0000:00:02.1: GuC firmware PRELOADED version 0.0 submission:SR-IOV VF
[   54.257020] i915 0000:00:02.1: HuC firmware PRELOADED
[   54.260128] i915 0000:00:02.1: [drm] Protected Xe Path (PXP) protected content support initialized
[   54.260753] i915 0000:00:02.1: [drm] PMU not supported for this GPU.
[   54.261538] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.1 on minor 0

Even tho 1.9. is not configured anymore, you can even see that "70.20" is configured, i added it to the output: Unable to confirm version 1.9 (0000000000000000) 70 20

Maybe someone else understands whats happening?

Second finding, tons of this, but also stoppes after a while:


[ 2357.330137] Purging GPU memory, 0 pages freed, 0 pages still pinned, 227 pages left available.
[ 2357.330215] Purging GPU memory, 0 pages freed, 0 pages still pinned, 227 pages left available.
[ 2357.330547] Purging GPU memory, 0 pages freed, 0 pages still pinned, 227 pages left available.
[ 2357.330571] Purging GPU memory, 0 pages freed, 0 pages still pinned, 227 pages left available.
[ 2357.331825] Purging GPU memory, 0 pages freed, 0 pages still pinned, 227 pages left available.
[ 2357.331949] Purging GPU memory, 0 pages freed, 0 pages still pinned, 227 pages left available.
[ 2357.332239] Purging GPU memory, 0 pages freed, 0 pages still pinned, 227 pages left available.
[ 2357.332270] Purging GPU memory, 0 pages freed, 0 pages still pinned, 227 pages left available.
[ 2357.333534] Purging GPU memory, 0 pages freed, 0 pages still pinned, 227 pages left available.
[ 2357.333664] Purging GPU memory, 0 pages freed, 0 pages still pinned, 227 pages left available.
[ 2357.333964] Purging GPU memory, 0 pages freed, 0 pages still pinned, 227 pages left available.
[ 2357.335184] Purging GPU memory, 0 pages freed, 0 pages still pinned, 227 pages left available.
[ 2357.335326] Purging GPU memory, 0 pages freed, 0 pages still pinned, 227 pages left available.
[ 2357.345919] Purging GPU memory, 0 pages freed, 0 pages still pinned, 227 pages left available.
``
sshaikh commented 1 week ago

Perhaps naïve, but at this point might it not be easier to DKMS-fy the relevant i915 codebase from Intel? Similar to the work that was carried out at the start of this repo?

fanshu93 commented 1 week ago

+1 Expect 6.8 kernel support!

Daniel15 commented 1 week ago

So I was looking into this, and it turns out Intel are adding a new graphics driver that has native SR-IOV support:

This driver is for Intel Xe graphics, which includes the integrated graphics in Tiger Lake (11th gen) and newer, as well as their discrete GPUs.

The pull request says it's experimental for Tiger Lake (11th gen) to Meteor Lake (1st gen of Core Ultra mobile CPUs, released December 2023), and will be used as the primary driver for the next generation onwards.

What's unclear to me is:

  1. Has this actually been merged yet? Is this driver ready to test in 6.9, or are more improvements needed?
  2. Will it support SR-IOV in integrated GPUs or only discrete GPUs?
  3. When will the driver be considered production-ready? (i.e. no longer "experimental")
Daniel15 commented 1 week ago

Also, to add to the above, it looks like Intel's fork of Linux does have support for SR-IOV in the i915 driver in Linux v6.9 (there's a commit from two weeks ago: https://github.com/intel/mainline-tracking/commit/ec2aa96813012243cf62765d42da51d880eb4b3c). Could that be merged into this repo?

ich777 commented 1 week ago

@Daniel15 they delayed it multiple times already, please follow this issue: https://github.com/intel/linux-intel-lts/issues/33

It's simply not done yet and not fully supported.

BTW, please don't do multiple posts (exactly the same) in different places, you already got a answer on the Unraid forums.

Daniel15 commented 1 week ago

@Daniel15 they delayed it multiple times already, please follow this issue: intel/linux-intel-lts#33

It's simply not done yet and not fully supported.

Makes sense. What about the SR-IOV code in the i915 driver in 6.9? https://github.com/intel/mainline-tracking/commit/ec2aa96813012243cf62765d42da51d880eb4b3c

BTW, please don't do multiple posts (exactly the same) in different places, you already got a answer on the Unraid forums.

This Github repo is separate to Unraid, and discussion about this driver makes sense in both places, just like how the same questions and updates about Unraid are discussed on both Reddit and in the forum.

It doesn't make sense for me to just post in the Unraid forum as this repo is broader than that. I'm using it on a non-Unraid server too. I'd guess that Unraid users are in the minority given how many more users Proxmox and regular Linux distros have.

ich777 commented 1 week ago

Makes sense. What about the SR-IOV code in the i915 driver in 6.9? intel/mainline-tracking@ec2aa96

IIRC it's not fully functional in Kernel 6.9, I compiled a custom Unraid version with that Kernel for a user to test, because I got no compatible hardware and ultimately it was not working.

johntdavis84 commented 1 week ago

This is a confusing process to follow because Intel is slowly staging out this driver in phases. Kernel 6.8, 6.9, and 6.10 have all added various bits of software infrastructure that needs to be in place before the Xe driver is even in shape to be used by regular users for testing.

Compare that to other peripherals where support usually just shows up all at once in a kernel update or official/from the manufacturer DKMS, etc.

It’s definitely a good idea to keep an eye on the mainline/lts GitHub linked above. My current understanding from reading Phoronix posts is that we won’t see a complete implementation of the Xe driver in the kernel until at least 6.11. And that won’t arrive until at least October 2024.

For now, I’ve pegged my Proxmox host that uses this DKMS to 6.5.13-5, and when I want acceleration in a VM, I stick to an Ubuntu or Debian version that the process of using the DKMS is well documented for—so far, some version of 6.5.x. It’s all been perfectly stable.

We’ll have much more freedom to use newer kernels once Xe is fully mainlined and SR-IOV support just works when you install Proxmox (well, after you’ve set the appropriate UEFI and kernel options).

On Jun 28, 2024, at 1:15 AM, Christoph Hummer @.***> wrote:

Makes sense. What about the SR-IOV code in the i915 driver in 6.9? @.*** https://github.com/intel/mainline-tracking/commit/ec2aa96813012243cf62765d42da51d880eb4b3c IIRC it's not fully functional in Kernel 6.9, I compiled a custom Unraid version with that Kernel for a user to test, because I got no compatible hardware and ultimately it was not working.

— Reply to this email directly, view it on GitHub https://github.com/strongtz/i915-sriov-dkms/issues/169#issuecomment-2196215666, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGI5CYTXMOKOBP26RYDHXOLZJT5ORAVCNFSM6AAAAABHMNVXJWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJWGIYTKNRWGY. You are receiving this because you are subscribed to this thread.

Daniel15 commented 1 week ago

IIRC it's not fully functional in Kernel 6.9, I compiled a custom Unraid version with that Kernel for a user to test, because I got no compatible hardware and ultimately it was not working.

@ich777 You compiled the Intel kernel fork specifically? Would you be interested in testing if someone were to send you compatible hardware?

This is a confusing process to follow because Intel is slowly staging out this driver in phases. Kernel 6.8, 6.9, and 6.10 have all added various bits of software infrastructure that needs to be in place before the Xe driver is even in shape to be used by regular users for testing.

Compare that to other peripherals where support usually just shows up all at once in a kernel update or official/from the manufacturer DKMS, etc.

@johntdavis84 Writing a driver for something as complex as a graphics card is hard, it sometimes takes a long time to get code merged into the kernel, and nobody wants to review giant pull requests, so it makes sense that they're breaking up the work into smaller, more manageable chunks.

ich777 commented 1 week ago

@ich777 You compiled the Intel kernel fork specifically?

I compiled the Kernel which includes that changes and it doesn't work.

Would you be interested in testing if someone were to send you compatible hardware?

Sorry, but I don't have any capacity to do more as I'm doing already. I have about 180 Applications/Plugins on the CA App and if something is not working or broken I fix it or in the case of this plugin (which I only compile the driver for) I create Issues or sometimes PRs if it's a minor fix and have time to look at it. Hope you understand that.

Daniel15 commented 1 week ago

Thank you @ich777. Makes sense. The whole Unraid community appreciates your work :)