Open huiwangnick opened 2 months ago
Hi @huiwangnick
From https://github.com/intel-gpu/intel-gpu-i915-backports/issues/194 log, you are using out of tree driver (version 1.24.4.12.240603.18.6.8.0.40+i1-1, package name intel-dmabuf-drm-i915-dkms) rather than upstream driver.
Meanwhile, the driver version used is different from our recommendation (i.e., intel-i915-dkms
in https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/install_linux_gpu.md#for-linux-kernel-65).
Can you share your driver installation steps and commands?
Hi @qiyuangong
I apologize for not being clearer earlier. In the issue discussed in intel-gpu/intel-gpu-i915-backports#194, I am using the out-of-tree driver; however, in this case, I am utilizing the upstream driver from the Linux kernel. I have also attempted your recommendation of using intel-i915-dkms, but all the drivers exhibit the same GUC load failure issue that I mentioned earlier.
Hi @qiyuangong
I apologize for not being clearer earlier. In the issue discussed in intel-gpu/intel-gpu-i915-backports#194, I am using the out-of-tree driver; however, in this case, I am utilizing the upstream driver from the Linux kernel. I have also attempted your recommendation of using intel-i915-dkms, but all the drivers exhibit the same GUC load failure issue that I mentioned earlier.
That's fine. :)
First of all, please check GPU installation (PCIe and power), the BIOS config and firmware of these GPUs. It seems they cannot be correctly initialized. We previous encountered GUC related errors when GPU is not correctly installed.
[ 18.422225] i915 0000:0d:00.0: [drm] Finished loading DMC firmware i915/dg2_dmc_ver2_08.bin (v2.8)
[ 18.430827] i915 0000:0d:00.0: [drm] GT0: GUC: ADS capture alloc size changed from 32768 to 36864
[ 18.431927] i915 0000:0d:00.0: [drm] GT0: GuC firmware i915/dg2_guc_70.bin version 70.20.0
[ 18.431930] i915 0000:0d:00.0: [drm] GT0: HuC firmware i915/dg2_huc_gsc.bin version 7.10.15
[ 18.432046] i915 0000:0d:00.0: [drm] GT0: GUC: ADS capture alloc size changed from 32768 to 36864
[ 18.432614] i915 0000:0d:00.0: [drm] GT0: GUC: load failed: status = 0x40000056, time = 0ms, freq = 2400MHz, ret = 0
[ 18.432617] i915 0000:0d:00.0: [drm] GT0: GUC: load failed: status: Reset = 0, BootROM = 0x2B, UKernel = 0x00, MIA = 0x00, Auth = 0x01
[ 18.432619] i915 0000:0d:00.0: [drm] GT0: GUC: firmware production part check failure
[ 18.432684] i915 0000:0d:00.0: [drm] *ERROR* GT0: GuC initialization failed -ENOEXEC
[ 18.432688] i915 0000:0d:00.0: [drm] *ERROR* GT0: Enabling uc failed (-5)
[ 18.432690] i915 0000:0d:00.0: [drm] *ERROR* GT0: Failed to initialize GPU, declaring it wedged!
Then , 6.8.0 kernel (with its upstream driver version) is not recommended by ipex-llm. It requires higher level-zero and oneAPI versions. Using 6.8 will encounter level zero mismatch with our recommended package. Please change to the recommended kernel version, i.e., 6.5.0.
Thank you. I will try 6.5.0 kernel to see if I can get it to work.
In the meantime, I’d like to add that when the BAR is not resized to 16,384 MB and remains at 256 MB, the driver loads successfully. This suggests that the issue is not related to GPU installation or GPU firmware. It seems possible that the Resizable BAR feature could be causing this problem.
[ 18.110784] i915 0000:0d:00.0: [drm] Failed to resize BAR2 to 16384M (-ENOSPC)
[ 18.110792] i915 0000:0d:00.0: BAR 2 [mem 0x13ffe0000000-0x13ffefffffff 64bit pref]: assigned
[ 18.110938] i915 0000:0d:00.0: [drm] Local memory IO size: 0x0000000010000000
[ 18.110943] i915 0000:0d:00.0: [drm] Local memory available: 0x00000003fa000000
[ 18.110946] i915 0000:0d:00.0: [drm] Using a reduced BAR size of 256MiB. Consider enabling 'Resizable BAR' or similar, if available in the BIOS.
[ 18.132145] i915 0000:0d:00.0: [drm] Finished loading DMC firmware i915/dg2_dmc_ver2_08.bin (v2.8)
[ 18.141268] i915 0000:0d:00.0: [drm] GT0: GuC firmware i915/dg2_guc_70.bin version 70.20.0
[ 18.141273] i915 0000:0d:00.0: [drm] GT0: HuC firmware i915/dg2_huc_gsc.bin version 7.10.3
[ 18.149102] i915 0000:0d:00.0: [drm] GT0: GUC: submission enabled
[ 18.149104] i915 0000:0d:00.0: [drm] GT0: GUC: SLPC enabled
[ 18.149347] i915 0000:0d:00.0: [drm] GT0: GUC: RC enabled
[ 18.177938] [drm] Initialized i915 1.6.0 20230929 for 0000:0d:00.0 on minor 1
Thank you. I will try 6.5.0 kernel to see if I can get it to work.
In the meantime, I’d like to add that when the BAR is not resized to 16,384 MB and remains at 256 MB, the driver loads successfully. This suggests that the issue is not related to GPU installation or GPU firmware. It seems possible that the Resizable BAR feature could be causing this problem.
[ 18.110784] i915 0000:0d:00.0: [drm] Failed to resize BAR2 to 16384M (-ENOSPC) [ 18.110792] i915 0000:0d:00.0: BAR 2 [mem 0x13ffe0000000-0x13ffefffffff 64bit pref]: assigned [ 18.110938] i915 0000:0d:00.0: [drm] Local memory IO size: 0x0000000010000000 [ 18.110943] i915 0000:0d:00.0: [drm] Local memory available: 0x00000003fa000000 [ 18.110946] i915 0000:0d:00.0: [drm] Using a reduced BAR size of 256MiB. Consider enabling 'Resizable BAR' or similar, if available in the BIOS. [ 18.132145] i915 0000:0d:00.0: [drm] Finished loading DMC firmware i915/dg2_dmc_ver2_08.bin (v2.8) [ 18.141268] i915 0000:0d:00.0: [drm] GT0: GuC firmware i915/dg2_guc_70.bin version 70.20.0 [ 18.141273] i915 0000:0d:00.0: [drm] GT0: HuC firmware i915/dg2_huc_gsc.bin version 7.10.3 [ 18.149102] i915 0000:0d:00.0: [drm] GT0: GUC: submission enabled [ 18.149104] i915 0000:0d:00.0: [drm] GT0: GUC: SLPC enabled [ 18.149347] i915 0000:0d:00.0: [drm] GT0: GUC: RC enabled [ 18.177938] [drm] Initialized i915 1.6.0 20230929 for 0000:0d:00.0 on minor 1
OK. If it's Resizable BAR (Base Address Register) related issue. Please check the BIOS config. It's recommended to enable this feature for ARC.
https://www.intel.com/content/www/us/en/support/articles/000090831/graphics.html
OS Ubuntu 24.04 Kernel 6.8.0-31-generic
Error message
intel-gpu/intel-gpu-i915-backports#194