intel / compute-runtime

Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver
MIT License
1.1k stars 229 forks source link

clinfo crashes in intel-compute-runtime #657

Open mysteryx93 opened 1 year ago

mysteryx93 commented 1 year ago

clinfo on Linux returns this error

Abort was called at 36 line in file:
/usr/src/debug/intel-compute-runtime/compute-runtime-23.09.25812.14/shared/source/built_ins/built_ins.cpp
fish: Job 1, 'clinfo' terminated by signal SIGABRT (Abort)

OS: Garuda Linux (arch-based) Laptop with dual-graphics Intel and NVidia

JablonskiMateusz commented 1 year ago

please share strace log and igc library versions

mysteryx93 commented 1 year ago

strace log clinfo.txt

libsigc++ v2.12.0-1.1

JablonskiMateusz commented 1 year ago

please share strace log and igc library versions

IGC meaning Intel Graphics Compiler https://github.com/intel/intel-graphics-compiler Each of our releases on GitHub has a reference to release of IGC, there should be installed the referenced version

mysteryx93 commented 1 year ago

intel-graphics-compiler 1:1.0.13822.6-1.1

from ALHP x86-64-v3 repo

JablonskiMateusz commented 1 year ago

please run ldd /usr/lib/libigdfcl.so.1 ldd /usr/lib/libigc.so.1

mysteryx93 commented 1 year ago
ldd /usr/lib/libigdfcl.so.1

linux-vdso.so.1 (0x00007ffdd13ef000)
liblldELF.so.15 => /usr/lib/liblldELF.so.15 (0x00007fd98f400000)
libopencl-clang.so.15 => /usr/lib/libopencl-clang.so.15 (0x00007fd98f303000)
liblldCommon.so.15 => /usr/lib/liblldCommon.so.15 (0x00007fd98f6ce000)
libLLVM-15.so => /usr/lib/libLLVM-15.so (0x00007fd987800000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007fd987400000)
libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007fd98f2de000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007fd987000000)
libz.so.1 => /usr/lib/libz.so.1 (0x00007fd98f2be000)
/usr/lib64/ld-linux-x86-64.so.2 (0x00007fd98f7ee000)
libLLVMSPIRVLib.so.15 => /usr/lib/../lib/libLLVMSPIRVLib.so.15 (0x00007fd986c00000)
libclang-cpp.so.15 => /usr/lib/../lib/libclang-cpp.so.15 (0x00007fd983a00000)
libffi.so.8 => /usr/lib/libffi.so.8 (0x00007fd98f2b1000)
libedit.so.0 => /usr/lib/libedit.so.0 (0x00007fd98f269000)
libzstd.so.1 => /usr/lib/libzstd.so.1 (0x00007fd9876f6000)
libncursesw.so.6 => /usr/lib/libncursesw.so.6 (0x00007fd98767d000)
libxml2.so.2 => /usr/lib/libxml2.so.2 (0x00007fd987275000)
libm.so.6 => /usr/lib/libm.so.6 (0x00007fd98390e000)
liblzma.so.5 => /usr/lib/liblzma.so.5 (0x00007fd987237000)
libicuuc.so.73 => /usr/lib/libicuuc.so.73 (0x00007fd983600000)
libicudata.so.73 => /usr/lib/libicudata.so.73 (0x00007fd981600000)

ldd /usr/lib/libigc.so.1

linux-vdso.so.1 (0x00007ffec81be000)
libSPIRV-Tools.so => /usr/lib/libSPIRV-Tools.so (0x00007fa4d67d8000)
liblldELF.so.15 => /usr/lib/liblldELF.so.15 (0x00007fa4d3e00000)
libLLVMSPIRVLib.so.15 => /usr/lib/libLLVMSPIRVLib.so.15 (0x00007fa4d3a00000)
liblldCommon.so.15 => /usr/lib/liblldCommon.so.15 (0x00007fa4d41c9000)
libLLVM-15.so => /usr/lib/libLLVM-15.so (0x00007fa4cbe00000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007fa4cba00000)
libm.so.6 => /usr/lib/libm.so.6 (0x00007fa4d40d7000)
libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007fa4d3ddb000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007fa4cb600000)
/usr/lib64/ld-linux-x86-64.so.2 (0x00007fa4d6984000)
libz.so.1 => /usr/lib/libz.so.1 (0x00007fa4d67b6000)
libffi.so.8 => /usr/lib/libffi.so.8 (0x00007fa4d3dce000)
libedit.so.0 => /usr/lib/libedit.so.0 (0x00007fa4d39ba000)
libzstd.so.1 => /usr/lib/libzstd.so.1 (0x00007fa4d38b0000)
libncursesw.so.6 => /usr/lib/libncursesw.so.6 (0x00007fa4cbd87000)
libxml2.so.2 => /usr/lib/libxml2.so.2 (0x00007fa4cb875000)
liblzma.so.5 => /usr/lib/liblzma.so.5 (0x00007fa4d3d90000)
libicuuc.so.73 => /usr/lib/libicuuc.so.73 (0x00007fa4cb200000)
libicudata.so.73 => /usr/lib/libicudata.so.73 (0x00007fa4c9200000)
JablonskiMateusz commented 1 year ago

please adjust igc packages to neo packages. You have IGC packages from reference of release WW17, while neo packages from release WW09. Please update neo to WW17 release or downgrade IGC to reference of WW09 release (1.0.13463.18)

mysteryx93 commented 12 months ago

What are Neo packages? What is WW17?

sudo downgrade intel-graphics-compiler 1.0.13463.18

clinfo
Abort was called at 37 line in file:
/usr/src/debug/intel-compute-runtime/compute-runtime-23.17.26241.22/shared/source/built_ins/built_ins.cpp
fish: Job 1, 'clinfo' terminated by signal SIGABRT (Abort)
ldd /usr/lib/libigdfcl.so.1
linux-vdso.so.1 (0x00007fffe67c9000)
liblldELF.so.15 => /usr/lib/liblldELF.so.15 (0x00007fe3df000000)
libopencl-clang.so.15 => /usr/lib/libopencl-clang.so.15 (0x00007fe3df304000)
liblldCommon.so.15 => /usr/lib/liblldCommon.so.15 (0x00007fe3df2cd000)
libLLVM-15.so => /usr/lib/libLLVM-15.so (0x00007fe3d7400000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007fe3d7000000)
libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007fe3defdb000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007fe3d6c00000)
libz.so.1 => /usr/lib/libz.so.1 (0x00007fe3defbb000)
/usr/lib64/ld-linux-x86-64.so.2 (0x00007fe3df4e8000)
libLLVMSPIRVLib.so.15 => /usr/lib/../lib/libLLVMSPIRVLib.so.15 (0x00007fe3d6800000)
libclang-cpp.so.15 => /usr/lib/../lib/libclang-cpp.so.15 (0x00007fe3d3600000)
libffi.so.8 => /usr/lib/libffi.so.8 (0x00007fe3defae000)
libedit.so.0 => /usr/lib/libedit.so.0 (0x00007fe3def66000)
libzstd.so.1 => /usr/lib/libzstd.so.1 (0x00007fe3dee5c000)
libncursesw.so.6 => /usr/lib/libncursesw.so.6 (0x00007fe3d7387000)
libxml2.so.2 => /usr/lib/libxml2.so.2 (0x00007fe3d6e72000)
libm.so.6 => /usr/lib/libm.so.6 (0x00007fe3d7295000)
liblzma.so.5 => /usr/lib/liblzma.so.5 (0x00007fe3d6e34000)
libicuuc.so.73 => /usr/lib/libicuuc.so.73 (0x00007fe3d3200000)
libicudata.so.73 => /usr/lib/libicudata.so.73 (0x00007fe3d1200000)

ldd /usr/lib/libigc.so.1
linux-vdso.so.1 (0x00007fff075f1000)
libSPIRV-Tools.so => /usr/lib/libSPIRV-Tools.so (0x00007f38ca292000)
liblldELF.so.15 => /usr/lib/liblldELF.so.15 (0x00007f38c9e00000)
libLLVMSPIRVLib.so.15 => /usr/lib/libLLVMSPIRVLib.so.15 (0x00007f38c9a00000)
liblldCommon.so.15 => /usr/lib/liblldCommon.so.15 (0x00007f38ccace000)
libLLVM-15.so => /usr/lib/libLLVM-15.so (0x00007f38c1e00000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f38c1a00000)
libm.so.6 => /usr/lib/libm.so.6 (0x00007f38ca1a0000)
libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007f38ccaa7000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007f38c1600000)
/usr/lib64/ld-linux-x86-64.so.2 (0x00007f38ccb42000)
libz.so.1 => /usr/lib/libz.so.1 (0x00007f38cca87000)
libffi.so.8 => /usr/lib/libffi.so.8 (0x00007f38cca7a000)
libedit.so.0 => /usr/lib/libedit.so.0 (0x00007f38cca34000)
libzstd.so.1 => /usr/lib/libzstd.so.1 (0x00007f38c98f6000)
libncursesw.so.6 => /usr/lib/libncursesw.so.6 (0x00007f38ca127000)
libxml2.so.2 => /usr/lib/libxml2.so.2 (0x00007f38c1872000)
liblzma.so.5 => /usr/lib/liblzma.so.5 (0x00007f38ca0e9000)
libicuuc.so.73 => /usr/lib/libicuuc.so.73 (0x00007f38c1200000)
libicudata.so.73 => /usr/lib/libicudata.so.73 (0x00007f38bf200000)
JablonskiMateusz commented 12 months ago

I mean it looks like you are mixing packages from two different releases.

WW09 (work week 9 of 2023 year) release https://github.com/intel/compute-runtime/releases/tag/23.09.25812.14 contains NEO in version 23.09.25812.14 and IGC in version 1.0.13463.18

WW17 release https://github.com/intel/compute-runtime/releases/tag/23.17.26241.22 contains NEO in version 23.17.26241.22 and IGC in version 1.0.13822.6

We cannot guarantee that NEO in version 23.09.25812.14 works with IGC in version 1.0.13822.6 or NEO in version 23.17.26241.22 works with IGC in version 1.0.13463.18

Please ensure you are using corresponding version of NEO and IGC. For other releases please find more details here: https://github.com/intel/compute-runtime/releases

mysteryx93 commented 12 months ago

What does NEO stand for?

So is this a packaging issue from the Linux distribution side? I'm curious why I seem to be the only one having that issue then.

mysteryx93 commented 12 months ago

NEO seems to stand for intel-compute-runtime. Why is it called NEO?

Ran a system update.

intel-compute-runtime: 23.17.26241.22-1.1 intel-graphics-compiler: 1:1.0.13822.6-1.1

That corresponds to the numbers in the release page. Not seeing any issue here.

mysteryx93 commented 12 months ago

If I uninstall those packages, clinfo works, and I can use OpenCL on my NVidia again. Not the ideal solution though.

JablonskiMateusz commented 12 months ago

What does NEO stand for?

https://github.com/intel/compute-runtime#what-is-neo

So is this a packaging issue from the Linux distribution side? I'm curious why I seem to be the only one having that issue then.

our package has a dependency on IGC package and it should be marked as required. Have you installed all packages from distro or you built something from source?

Please try below experiments:

  1. install below packages and run clinfo a. intel-compute-runtime: 23.09.25812.14-1.1 b. intel-graphics-compiler: 1:1.0.13822.6-1.1
  2. install below packages and run clinfo a. intel-compute-runtime: 23.17.26241.22-1.1 b. intel-graphics-compiler: 1:1.0.13463.18-1.1
mysteryx93 commented 12 months ago

In both cases it crashes.

I'm only installing from distro repos.

mysteryx93 commented 12 months ago

I have another problem with the laptop, and now I'm starting to think it may be related. There was a overlapping logging screen issue; but if I unplug the TV, I realize the problem is worse than that. The internal display is detected completely wrong.

Internal display is 1080p 144hz

Output of xrandr

Screen 0: minimum 8 x 8, current 1920 x 1080, maximum 32767 x 32767
DP-0 disconnected (normal left inverted right x axis y axis)
DP-1 disconnected (normal left inverted right x axis y axis)
HDMI-0 disconnected (normal left inverted right x axis y axis)
eDP-1-1 connected primary 1920x1080+0+0 (normal left inverted right x axis y axis) 344mm x 194mm
15360x8640    15.83    28.85
7680x4320     15.83    59.99    59.99    59.99
5120x2880     59.99    59.99
4096x2304     59.99    59.98
3840x2160     60.00    60.01    59.98    59.97
3200x1800     59.96    59.94
2880x1620     59.96    59.97
2560x1600     59.99    59.97
2560x1440     59.99    59.99    59.96    59.95
2048x1536     60.00
1920x1440     60.00
1856x1392     60.01
1792x1344     60.01
2048x1152     59.99    59.98    59.90    59.91
1920x1200     59.88    59.95
1920x1080     60.01*   59.97    59.96    59.93
1600x1200     60.00
1680x1050     59.95    59.88
1400x1050     59.98
1600x900      59.99    59.94    59.95    59.82
1280x1024     60.02
1400x900      59.96    59.88
1280x960      60.00
1440x810      60.00    59.97
1368x768      59.88    59.85
1280x800      59.99    59.97    59.81    59.91
1280x720      60.00    59.99    59.86    59.74
1024x768      60.04    60.00
960x720       60.00
928x696       60.05
896x672       60.01
1024x576      59.95    59.96    59.90    59.82
960x600       59.93    60.00
960x540       59.96    59.99    59.63    59.82
800x600       60.00    60.32    56.25
840x525       60.01    59.88
864x486       59.92    59.57
700x525       59.98
800x450       59.95    59.82
640x512       60.02
700x450       59.96    59.88
640x480       60.00    59.94
720x405       59.51    58.99
684x384       59.88    59.85
640x400       59.88    59.98
640x360       59.86    59.83    59.84    59.32
512x384       60.00
512x288       60.00    59.92
480x270       59.63    59.82
400x300       60.32    56.34
432x243       59.92    59.57
320x240       60.05
360x202       59.51    59.13
320x180       59.84    59.32
HDMI-1-1 disconnected (normal left inverted right x axis y axis)

It seems something is totally wrong with the Intel display driver. It's possible that it causes the OpenCL problem too. Anything comes to your mind?

JablonskiMateusz commented 11 months ago

In both cases it crashes.

I'm only installing from distro repos.

does it crash in the same place? Abort was called at 36 line in file: shared/source/built_ins/built_ins.cpp

when upgrading/downgrading igc do you change both libigc and libigdfcl packages ?

JablonskiMateusz commented 11 months ago

there is no explicit connection between OpenCL and display on Linux. OpenCL issue looks like misconfigured dependencies while the display issue if it is related to Intel device then I would recommend checking i915

mysteryx93 commented 11 months ago

With the other version it crashes at line 37 instead of 36.

What is i915? There is no i915 package installed. AFAIK Intel drivers are managed by the kernel.

mysteryx93 commented 11 months ago

Removed ALHP packages and it fixed the Intel OpenCL issue.

The other display problem remains, so is unrelated.

This means it's either an ALHP packaging issue, or a v3 compilation issue.

Indeed, re-installing ALHP packages and the problem came back. You can reproduce it by installing those v3-compiled packages.