intel-gpu / intel-gpu-i915-backports

Other
94 stars 63 forks source link

backport/main GPU support broken after latest commit on Kernel 5.14.0-427.40.1.el9_4.x86_64 #199

Closed upsj closed 4 weeks ago

upsj commented 4 weeks ago

After a kernel update, we rebuilt our GPU drivers, and while they can still attach successfully to the PCIe devices, they no longer get recognized by xpu-smi.

As a test, I built two versions of the drivers with Kernel 5.14.0-427.40.1.el9_4.x86_64 on Rocky 9.4:

[   14.911185] i915 0000:9a:00.0: SPI access overridden by jumper
[   14.938235] i915 0000:ca:00.0: Using 112 cores (56-111,168-223) for kthreads
[   14.939894] i915 0000:ca:00.0: IAF available
[   14.940834] i915 0000:ca:00.0: Attaching to 262144MiB of system memory on node 1
[   14.940866] i915 0000:ca:00.0: Using Transparent Hugepages
[   14.940907] i915 0000:ca:00.0: GT0: Local memory { size: 0x0000000c00000000, available: 0x0000000bff000000 }
[   14.941043] i915 0000:ca:00.0: GT0: GuC firmware i915/pvc_guc_70.26.4.bin: fetch failed -ENOENT
[   14.941045] i915 0000:ca:00.0: GT0: GuC firmware(s) can be downloaded from https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/i915
[   14.942672] i915 0000:ca:00.0: GT0: Can't run without GuC if GuC has previously been enabled
[   14.942794] i915 0000:ca:00.0: GT0: GuC initialization failed -ENOENT
[   14.942795] i915 0000:ca:00.0: GT0: Enabling uc failed (-5)
[   14.942796] i915 0000:ca:00.0: GT0: Failed to initialize GPU, declaring it wedged!
smuqthya commented 4 weeks ago

looks like firmware needs update. Did you check if this firmware is part of device?

upsj commented 4 weeks ago

There is no such firmware file in the listed git repository - where can I find it and the necessary steps to upgrade the firmware?

smuqthya commented 4 weeks ago

https://github.com/intel-gpu/intel-gpu-i915-backports?tab=readme-ov-file#dependencies Please check the Readme doc.