Open lamw opened 2 months ago
Could you please give it a try to load the NPU kernel driver with force_snoop=1
module parameter set? (that is rmmod intel_vpu; modprobe intel_vpu force_snoop=1
)
Using 1.6.0 instructions, looks like force_snoop=1
isn't working?
root@ubuntu:~# rmmod intel_vpu; modprobe intel_vpu force_snoop=1
root@ubuntu:~# dmesg|grep vpu
[ 1.911597] intel_vpu 0000:02:05.0: enabling device (0000 -> 0002)
[ 1.921169] intel_vpu 0000:02:05.0: [drm] Firmware: intel/vpu/vpu_37xx_v0.0.bin, version: 20240726*MTL_CLIENT_SILICON-release*0004*ci_tag_ud202428_vpu_rc_20240726_0004*e4a99ed6b3e
[ 2.980059] intel_vpu 0000:02:05.0: [drm] *ERROR* ivpu_boot(): Failed to boot the firmware: -110
[ 2.980328] intel_vpu 0000:02:05.0: [drm] *ERROR* ivpu_mmu_dump_event(): MMU EVTQ: 0x10 (Translation fault) SSID: 0 SID: 3, e[2] 00000000, e[3] 00000208, in addr: 0x84803000, fetch addr: 0x0
[ 2.980786] intel_vpu 0000:02:05.0: [drm] *ERROR* ivpu_mmu_dump_event(): MMU EVTQ: 0x10 (Translation fault) SSID: 0 SID: 3, e[2] 00000000, e[3] 00000208, in addr: 0x84803010, fetch addr: 0x0
[ 2.987821] intel_vpu 0000:02:05.0: [drm] ivpu_hw_37xx_power_down(): VPU not idle during power down
[ 2.987995] intel_vpu: probe of 0000:02:05.0 failed with error -110
[ 160.695061] intel_vpu: unknown parameter 'force_snoop' ignored
[ 160.697978] intel_vpu 0000:02:05.0: [drm] Firmware: intel/vpu/vpu_37xx_v0.0.bin, version: 20240726*MTL_CLIENT_SILICON-release*0004*ci_tag_ud202428_vpu_rc_20240726_0004*e4a99ed6b3e
[ 161.722348] intel_vpu 0000:02:05.0: [drm] *ERROR* ivpu_boot(): Failed to boot the firmware: -110
[ 161.722367] intel_vpu 0000:02:05.0: [drm] *ERROR* ivpu_mmu_dump_event(): MMU EVTQ: 0x10 (Translation fault) SSID: 0 SID: 3, e[2] 00000000, e[3] 00000208, in addr: 0x84803000, fetch addr: 0x0
[ 161.722387] intel_vpu 0000:02:05.0: [drm] *ERROR* ivpu_mmu_dump_event(): MMU EVTQ: 0x10 (Translation fault) SSID: 0 SID: 3, e[2] 00000000, e[3] 00000208, in addr: 0x84803010, fetch addr: 0x0
[ 161.728714] intel_vpu 0000:02:05.0: [drm] ivpu_hw_37xx_power_down(): VPU not idle during power down
[ 161.728995] intel_vpu: probe of 0000:02:05.0 failed with error -110
It is possible that the issue you are observing might be related to the hypervisor cache configuration.
There is a Patch that enables force_snoop
module parameter for intel_vpu driver.
You could try applying this patch or updating kernel to 6.11 that already contains the patch and retry with this parameter set.
hm ... so I just installed 6.11 kernel
# uname -r
6.11.0-061100rc6-generic
When I run the mmod intel_vpu; modprobe intel_vpu force_snoop=1
, I see following in dmesg:
[ 4.187932] intel_vpu 0000:13:00.0: [drm] *ERROR* ivpu_boot(): Failed to boot the firmware: -110
[ 4.188097] intel_vpu 0000:13:00.0: [drm] *ERROR* ivpu_mmu_dump_event(): MMU EVTQ: 0x10 (Translation fault) SSID: 0 SID: 3, e[2] 00000000, e[3] 00000208, in addr: 0x84803000, fetch addr: 0x0
[ 4.188361] intel_vpu 0000:13:00.0: [drm] *ERROR* ivpu_mmu_dump_event(): MMU EVTQ: 0x10 (Translation fault) SSID: 0 SID: 3, e[2] 00000000, e[3] 00000208, in addr: 0x84803010, fetch addr: 0x0
[ 4.198809] intel_vpu 0000:13:00.0: [drm] ivpu_hw_power_down(): NPU not idle during power down
[ 4.199002] intel_vpu 0000:13:00.0: probe with driver intel_vpu failed with error -110
[ 386.779470] intel_vpu 0000:13:00.0: [drm] Firmware: intel/vpu/vpu_37xx_v0.0.bin, version: 20240726*MTL_CLIENT_SILICON-release*0004*ci_tag_ud202428_vpu_rc_20240726_0004*e4a99ed6b3e
[ 386.904593] [drm] Initialized intel_vpu 1.0.0 for 0000:13:00.0 on minor 0
Interestingly, even though there's some issues I see VPU now initialized, does this mean its good?
I'm able to see the accel0 device :D
# ls /dev/accel/accel0
/dev/accel/accel0
If I reboot the system w/o using force_snoop=1
, then it fails as before
The force_snoop=1
parameter is only activated when explicitly specified with the modprobe
command line. To have this parameter enabled by default when the module is loaded, you can create a configuration file in the /etc/modprobe.d/
directory. Please follow these steps:
/etc/modprobe.d/intel_vpu.conf
options intel_vpu force_snoop=1
After you reboot your system, the force_snoop=1
parameter should be automatically applied when the intel_vpu module is loaded.
As for the log message: [ 386.904593] [drm] Initialized intel_vpu 1.0.0 for 0000:13:00.0 on minor 0
and the presence of /dev/accel/accel0
these are indeed indications that the driver has been successfully initialized and the device is correctly set up.
Thanks for the commands to persist parameter. I guess I'm trying to understand why this is needed? Is there something missing on how device is being presented to guest preventing initiation by default?
I'm also seeing issue w/device passthru to Windows system which throws classic Error Code 43, is there similar parameter for Windows driver?
Are there additional debug/verbose logs from the NPU Linux driver, I've been able to successfully do PCIe passthrough of the NPU from Intel 14th Gen system, but it looks like it fails to load firmware (-110) but no more details ... trying to understand what could be the cause whether this is on ESXi hypervisor and passthrough or something else ...
Here's snippet from dmesg (this is after installing the required drivers on Ubuntu 24.04)