strongtz / i915-sriov-dkms

dkms module of Linux i915 driver with SR-IOV support
1.1k stars 135 forks source link

Should dkms require the --force flag? #47

Closed FallingSnow closed 1 year ago

FallingSnow commented 1 year ago

Tried to install on fully updated Arch Linux install but it seems to fail with i915 kernel module already existing.

:: Proceed with installation? [Y/n]
(1/1) checking keys in keyring                                   [####################################] 100%
(1/1) checking package integrity                                 [####################################] 100%
(1/1) loading package files                                      [####################################] 100%
(1/1) checking for file conflicts                                [####################################] 100%
(1/1) checking available disk space                              [####################################] 100%
:: Processing package changes...
(1/1) installing i915-sriov-dkms-git                             [####################################] 100%
The i915 kernel module will be available on reboot.
:: Running post-transaction hooks...
(1/2) Arming ConditionNeedsUpdate...
(2/2) Install DKMS modules
==> dkms install --no-depmod i915-sriov-dkms/5.15.71 -k 6.2.2-arch2-1
Module version  for i915.ko.zst
exactly matches what is already found in kernel 6.2.2-arch2-1.
DKMS will not replace this module.
You may override by specifying --force.
Error! Installation aborted.
==> WARNING: `dkms install --no-depmod i915-sriov-dkms/5.15.71 -k 6.2.2-arch2-1' exited 6
dcarrion87 commented 1 year ago

@FallingSnow just curious what you ended up doing to make this work with 6.2.X kernel versions on arch? I'm on 6.2.8 at the moment and struggling to get it going.

FallingSnow commented 1 year ago

I think to get it to install all you have to do is (given my example above) run dkms install --no-depmod i915-sriov-dkms/5.15.71 -k 6.2.2-arch2-1 --force. Same command as the warning just with the --force flag.

I think there was another issue with where it was installed to. The dkms was putting the built module in somewhere arch wasn't looking. So I just copied the dkms i915.ko to the directory where to original was. Just search for the i915.ko file before and after running dkms and you should see the different locations. Then reboot and it should work. This makes for a fairly brittle install as I assume updating anything will break this.

However I recommend just installing https://aur.archlinux.org/packages/linux-intel-lts-sriov, I believe that is a way more reliable solution.

dcarrion87 commented 1 year ago

Aah that's the one. Had to:

cp /var/lib/dkms/i915-sriov-dkms/5.15.71/6.2.8-arch1-1/x86_64/module/i915.ko.zst /usr/lib/modules/6.2.8-arch1-1/kernel/drivers/gpu/drm/i915/i915.ko.zst

Different error now. Is this expected?

echo "1" > /sys/devices/pci0000\:00/0000\:00\:02.0/sriov_numvfs
-bash: echo: write error: Numerical result out of range
FallingSnow commented 1 year ago

Different error now. Is this expected?

No. That is strange. Just tried your command and it works as expected for me. Can you cat it?

dcarrion87 commented 1 year ago

Results:

# cat /sys/devices/pci0000\:00/0000\:00\:02.0/sriov_numvfs
0

PCI info:

# lspci -s 00:02.0 -vvv
00:02.0 VGA compatible controller: Intel Corporation AlderLake-S GT1 (rev 0c) (prog-if 00 [VGA controller])
    DeviceName: Onboard - Video
    Subsystem: Gigabyte Technology Co., Ltd Device d000
    Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0, Cache Line Size: 64 bytes
    Interrupt: pin A routed to IRQ 177
    IOMMU group: 0
    Region 0: Memory at 41000000 (64-bit, non-prefetchable) [size=16M]
    Region 2: Memory at 50000000 (64-bit, prefetchable) [size=256M]
    Region 4: I/O ports at 5000 [size=64]
    Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
    Capabilities: [40] Vendor Specific Information: Len=0c <?>
    Capabilities: [70] Express (v2) Root Complex Integrated Endpoint, MSI 00
        DevCap: MaxPayload 128 bytes, PhantFunc 0
            ExtTag- RBE+ FLReset+
        DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
            RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
            MaxPayload 128 bytes, MaxReadReq 128 bytes
        DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
        DevCap2: Completion Timeout: Not Supported, TimeoutDis- NROPrPrP- LTR-
             10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
             EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
             FRS-
             AtomicOpsCap: 32bit- 64bit- 128bitCAS-
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
             AtomicOpsCtl: ReqEn-
    Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit-
        Address: fee00018  Data: 0000
        Masking: 00000000  Pending: 00000000
    Capabilities: [d0] Power Management version 2
        Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
        Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [100 v1] Process Address Space ID (PASID)
        PASIDCap: Exec- Priv-, Max PASID Width: 14
        PASIDCtl: Enable- Exec- Priv-
    Capabilities: [200 v1] Address Translation Service (ATS)
        ATSCap: Invalidate Queue Depth: 00
        ATSCtl: Enable+, Smallest Translation Unit: 00
    Capabilities: [300 v1] Page Request Interface (PRI)
        PRICtl: Enable- Reset-
        PRISta: RF- UPRGI- Stopped+
        Page Request Capacity: 00008000, Page Request Allocation: 00000000
    Capabilities: [320 v1] Single Root I/O Virtualization (SR-IOV)
        IOVCap: Migration- 10BitTagReq- Interrupt Message Number: 000
        IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy- 10BitTagReq-
        IOVSta: Migration-
        Initial VFs: 7, Total VFs: 7, Number of VFs: 0, Function Dependency Link: 00
        VF offset: 1, stride: 1, Device ID: 4680
        Supported Page Size: 00000553, System Page Size: 00000001
        Region 0: Memory at 0000000044000000 (64-bit, non-prefetchable)
        Region 2: Memory at 0000000000000000 (64-bit, prefetchable)
        VF Migration: offset: 00000000, BIR: 0
    Kernel driver in use: i915
    Kernel modules: i915

Guessing this is the reason why out of range error:

cat /sys/devices/pci0000\:00/0000\:00\:02.0/sriov_totalvfs
0
dcarrion87 commented 1 year ago

And to compare boot:

[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-linux root=UUID=6ef6c074-f253-4893-8df7-2e21dba1e9f6 rw loglevel=3 quiet intel_iommu=on iommu=pt i915.enable_guc=7 vfio-pci.ids=1b4b:9215,10de:2507,10de:228e
[    0.058921] Kernel command line: BOOT_IMAGE=/vmlinuz-linux root=UUID=6ef6c074-f253-4893-8df7-2e21dba1e9f6 rw loglevel=3 quiet intel_iommu=on iommu=pt i915.enable_guc=7 vfio-pci.ids=1b4b:9215,10de:2507,10de:228e
[    8.205684] i915 0000:00:02.0: [drm] VT-d active for gfx access
[    8.205687] i915 0000:00:02.0: vgaarb: deactivate vga console
[    8.205735] i915 0000:00:02.0: [drm] Using Transparent Hugepages
[    8.206348] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[    8.206887] mei_hdcp 0000:00:16.0-b638ab7e-94e2-4ea2-a552-d1c54b627f04: bound 0000:00:02.0 (ops i915_hdcp_component_ops [i915])
[    8.215919] i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/adls_dmc_ver2_01.bin (v2.1)
[    8.333721] i915 0000:00:02.0: [drm] Missing GuC-Err-Cap reglist Class(1):Compute(4)!
[    8.333725] i915 0000:00:02.0: [drm] Missing GuC-Err-Cap reglist Instance(2):Compute(4)!
[    8.335307] i915 0000:00:02.0: [drm] Missing GuC-Err-Cap reglist Class(1):Compute(4)!
[    8.335308] i915 0000:00:02.0: [drm] Missing GuC-Err-Cap reglist Instance(2):Compute(4)!
[    8.335712] i915 0000:00:02.0: [drm] GuC firmware i915/tgl_guc_70.bin version 70.5.1
[    8.335713] i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc.bin version 7.9.3
[    8.336146] i915 0000:00:02.0: [drm] Missing GuC-Err-Cap reglist Class(1):Compute(4)!
[    8.336146] i915 0000:00:02.0: [drm] Missing GuC-Err-Cap reglist Instance(2):Compute(4)!
[    8.337925] i915 0000:00:02.0: [drm] HuC authenticated
[    8.338378] i915 0000:00:02.0: [drm] GuC submission enabled
[    8.338379] i915 0000:00:02.0: [drm] GuC SLPC enabled
[    8.338730] i915 0000:00:02.0: [drm] GuC RC: enabled
[    8.339499] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: bound 0000:00:02.0 (ops i915_pxp_tee_component_ops [i915])
[    8.339589] i915 0000:00:02.0: [drm] Protected Xe Path (PXP) protected content support initialized
[    8.359923] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 0
[    8.438070] snd_hda_intel 0000:00:1f.3: bound 0000:00:02.0 (ops i915_audio_component_bind_ops [i915])
[    9.136565] fbcon: i915drmfb (fb0) is primary device
[    9.204839] i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
FallingSnow commented 1 year ago

Hmm, I have a 13600k and am on the intel-lts kernel but from what I've read 12th gen is supposed to work.

$ sudo dmesg | grep i915
[    0.000000] Command line: root=UUID=e454dd40-3ef8-4a9d-bb18-f3accf0a3596 rootflags=atgc rw video=HDMI-A-1
:1920x1080@60me console=tty0 console=ttyS4,115200 default_hugepagesz=1G hugepagesz=1G hugepages=24 intel_iom
mu=on iommu=pt i915.enable_guc=7
[    0.045893] Kernel command line: root=UUID=e454dd40-3ef8-4a9d-bb18-f3accf0a3596 rootflags=atgc rw video=H
DMI-A-1:1920x1080@60me console=tty0 console=ttyS4,115200 default_hugepagesz=1G hugepagesz=1G hugepages=24 in
tel_iommu=on iommu=pt i915.enable_guc=7
[    0.985546] i915 0000:00:02.0: Running in SR-IOV PF mode
[    0.986190] i915 0000:00:02.0: [drm] VT-d active for gfx access
[    0.986237] i915 0000:00:02.0: vgaarb: deactivate vga console
[    0.986264] i915 0000:00:02.0: [drm] Using Transparent Hugepages
[    0.986840] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[    0.987867] i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/adls_dmc_ver2_01.bin (v2.1)
[    1.077438] i915 0000:00:02.0: [drm] Missing GuC-Err-Cap reglist Class(1):Compute(4)!
[    1.077441] i915 0000:00:02.0: [drm] Missing GuC-Err-Cap reglist Instance(2):Compute(4)!
[    1.078539] i915 0000:00:02.0: [drm] Missing GuC-Err-Cap reglist Class(1):Compute(4)!
[    1.078540] i915 0000:00:02.0: [drm] Missing GuC-Err-Cap reglist Instance(2):Compute(4)!
[    1.078942] i915 0000:00:02.0: [drm] GuC firmware i915/tgl_guc_70.bin version 70.5.1
[    1.078943] i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc.bin version 7.9.3
[    1.079375] i915 0000:00:02.0: [drm] Missing GuC-Err-Cap reglist Class(1):Compute(4)!
[    1.079375] i915 0000:00:02.0: [drm] Missing GuC-Err-Cap reglist Instance(2):Compute(4)!
[    1.078942] i915 0000:00:02.0: [drm] GuC firmware i915/tgl_guc_70.bin version 70.5.1
[    1.078943] i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc.bin version 7.9.3
[    1.079375] i915 0000:00:02.0: [drm] Missing GuC-Err-Cap reglist Class(1):Compute(4)!
[    1.079375] i915 0000:00:02.0: [drm] Missing GuC-Err-Cap reglist Instance(2):Compute(4)!
[    1.081786] i915 0000:00:02.0: [drm] HuC authenticated
[    1.082236] i915 0000:00:02.0: [drm] GuC submission enabled
[    1.082236] i915 0000:00:02.0: [drm] GuC SLPC enabled
[    1.082673] i915 0000:00:02.0: [drm] GuC RC: enabled
[    1.083477] i915 0000:00:02.0: [drm] Protected Xe Path (PXP) protected content support initialized
[    1.112651] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 0
[    1.140763] i915 0000:00:02.0: 7 VFs could be associated with this PF
[    1.144693] i915 0000:00:02.0: [drm] User-defined mode not supported: "1920x1080": 60 185936 1920 2048 2264 2678 1080 1081 1084 1157 0x20 0x6
[    1.144760] fbcon: i915drmfb (fb0) is primary device
[    1.144762] i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
[    1.167549] i915 0000:00:02.0: [drm] User-defined mode not supported: "1920x1080": 60 185936 1920 2048 2264 2678 1080 1081 1084 1157 0x20 0x6
[    1.180147] i915 0000:00:02.0: [drm] User-defined mode not supported: "1920x1080": 60 185936 1920 2048 2264 2678 1080 1081 1084 1157 0x20 0x6
[    7.203557] snd_hda_intel 0000:00:1f.3: bound 0000:00:02.0 (ops i915_audio_component_bind_ops [i915])
[23151.381883] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[23151.381942] i915 0000:00:02.1: enabling device (0000 -> 0002)
[23151.381989] i915 0000:00:02.1: Running in SR-IOV VF mode
[23151.382810] i915 0000:00:02.1: GuC interface version 0.1.0.0
[23151.383294] i915 0000:00:02.1: [drm] VT-d active for gfx access
[23151.383319] i915 0000:00:02.1: [drm] Using Transparent Hugepages
[23151.383788] i915 0000:00:02.1: GuC interface version 0.1.0.0
[23151.384071] i915 0000:00:02.1: GuC firmware PRELOADED version 1.0 submission:SR-IOV VF
[23151.384073] i915 0000:00:02.1: HuC firmware PRELOADED
[23151.386864] i915 0000:00:02.1: [drm] Protected Xe Path (PXP) protected content support initialized
[23151.386869] i915 0000:00:02.1: [drm] PMU not supported for this GPU.
[23151.387027] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.1 on minor 1
[23151.387144] i915 0000:00:02.0: Enabled 1 VFs
$ cat /sys/devices/pci0000\:00/0000\:00\:02.0/sriov_totalvfs
7
$ sudo lspci -s 00:02.0 -vvv
0000:00:02.0 VGA compatible controller: Intel Corporation Raptor Lake-S GT1 [UHD Graphics 770] (rev 04) (prog-if 00 [VGA controller])
        DeviceName: Onboard - Video
        Subsystem: ASUSTeK Computer Inc. Device 8694
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 127
        IOMMU group: 0
        Region 0: Memory at 60eb000000 (64-bit, non-prefetchable) [size=16M]
        Region 2: Memory at 4000000000 (64-bit, prefetchable) [size=256M]
        Region 4: I/O ports at 4000 [size=64]
        Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
        Capabilities: [40] Vendor Specific Information: Len=0c <?>
        Capabilities: [70] Express (v2) Root Complex Integrated Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0
                        ExtTag- RBE+ FLReset+
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis- NROPrPrP- LTR-
                         10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
        Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit-
                Address: fee00018  Data: 0000
                Masking: 00000000  Pending: 00000000
        Capabilities: [d0] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [100 v1] Process Address Space ID (PASID)
                PASIDCap: Exec- Priv-, Max PASID Width: 14
                PASIDCtl: Enable- Exec- Priv-
        Capabilities: [200 v1] Address Translation Service (ATS)
                ATSCap: Invalidate Queue Depth: 00
                ATSCtl: Enable+, Smallest Translation Unit: 00
        Capabilities: [300 v1] Page Request Interface (PRI)
                PRICtl: Enable- Reset-
                PRISta: RF- UPRGI- Stopped+
                Page Request Capacity: 00008000, Page Request Allocation: 00000000
        Capabilities: [320 v1] Single Root I/O Virtualization (SR-IOV)
                IOVCap: Migration- 10BitTagReq- Interrupt Message Number: 000
                IOVCtl: Enable+ Migration- Interrupt- MSE+ ARIHierarchy- 10BitTagReq-
                IOVSta: Migration-
                Initial VFs: 7, Total VFs: 7, Number of VFs: 1, Function Dependency Link: 00
                VF offset: 1, stride: 1, Device ID: a780
                Supported Page Size: 00000553, System Page Size: 00000001
                Region 0: Memory at 00000060e4000000 (64-bit, non-prefetchable)
                Region 2: Memory at 0000006000000000 (64-bit, prefetchable)
                VF Migration: offset: 00000000, BIR: 0
        Kernel driver in use: i915
        Kernel modules: i915

I think you should try the AUR kernel I sent earlier. That way you can rule out if it's even supported on your system.

FallingSnow commented 1 year ago

Just saw this in your lspci. So I guess it should work.

Capabilities: [320 v1] Single Root I/O Virtualization (SR-IOV)
        IOVCap: Migration- 10BitTagReq- Interrupt Message Number: 000
        IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy- 10BitTagReq-
        IOVSta: Migration-
        Initial VFs: 7, Total VFs: 7, Number of VFs: 0, Function Dependency Link: 00
FallingSnow commented 1 year ago

Also maybe downgrade your kernel to like 6.2.1, maybe this is a recent change that just broke?

dcarrion87 commented 1 year ago

Nup no go with that version or 6.1.12. I think I'll give it a miss until it makes it mainstream. It was purely to play and I don't really have a strong use case. Thanks for sending all that stuff through.

thegawin commented 1 year ago

In my case on LTS I needed to use i915.enable_guc=7.