intel / compute-runtime

Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver
MIT License
1.1k stars 229 forks source link

Abort at L0 initialization #700

Closed ye-luo closed 5 months ago

ye-luo commented 5 months ago

I tried to build the L0 runtime 23.30.26918.50 with

cmake -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ -DIGC_DIR=/soft/libraries/intel-gpu-umd/stable_736_25_20231031/compiler -DGMM_DIR=/soft/libraries/intel-gpu-umd/stable_736_25_20231031/driver ..

where /soft/libraries/intel-gpu-umd/stable_736_25_20231031 is a full installation of runtime and igc compiler. In this way, I only rebuild the L0 runtime and then use LD_LIBRARY_PATH to load libze_intel_gpu.so.1 during run.

but got not so helpful message

Abort was called at 210 line in file:
/home/yeluo/opt/compute-runtime/shared/source/os_interface/linux/engine_info.cpp

https://github.com/intel/compute-runtime/blob/363312afd1389cb7d28f2043e34d849068f2b99c/shared/source/os_interface/linux/engine_info.cpp#L210 full backtrace

#0  0x000015049b9cbc6b in raise () from /lib64/libc.so.6
#1  0x000015049b9cd305 in abort () from /lib64/libc.so.6
#2  0x000015049196f23d in NEO::abortExecution () at /home/yeluo/opt/compute-runtime/shared/source/helpers/abort.cpp:14
#3  0x00001504919cbed0 in NEO::abortUnrecoverable (line=line@entry=210, file=file@entry=0x150491f3f950 "/home/yeluo/opt/compute-runtime/shared/source/os_interface/linux/engine_info.cpp")
    at /home/yeluo/opt/compute-runtime/shared/source/helpers/debug_helpers.cpp:27
#4  0x0000150491ea9712 in NEO::EngineInfo::assignCopyEngine (this=this@entry=0x4693280, baseEngineType=<optimized out>, tileId=tileId@entry=0, engine=..., bcsInfoMask=..., 
    numHostLinkCopyEngines=@0x7ffcf53c6028: 0, numScaleUpLinkCopyEngines=@0x7ffcf53c602c: 0) at /home/yeluo/opt/compute-runtime/shared/source/os_interface/linux/engine_info.cpp:210
#5  0x0000150491ea9b28 in NEO::EngineInfo::EngineInfo (this=this@entry=0x4693280, drm=drm@entry=0x4692f30, engineInfos=...) at /home/yeluo/opt/compute-runtime/shared/source/os_interface/linux/engine_info.cpp:68
#6  0x0000150491ea8319 in std::make_unique<NEO::EngineInfo, NEO::Drm*, std::vector<NEO::EngineCapabilities, std::allocator<NEO::EngineCapabilities> >&> ()
    at /opt/cray/pe/gcc/11.2.0/snos/include/g++/bits/unique_ptr.h:962
#7  NEO::IoctlHelper::createEngineInfo (this=0x4693140, isSysmanEnabled=false) at /home/yeluo/opt/compute-runtime/shared/source/os_interface/linux/ioctl_helper.cpp:551
#8  0x0000150491e9c55f in NEO::Drm::queryEngineInfo (this=this@entry=0x4692f30, isSysmanEnabled=isSysmanEnabled@entry=false) at /home/yeluo/opt/compute-runtime/shared/source/os_interface/linux/drm_neo.cpp:954
#9  0x0000150491e9c717 in NEO::Drm::queryEngineInfo (this=this@entry=0x4692f30) at /home/yeluo/opt/compute-runtime/shared/source/os_interface/linux/drm_neo.cpp:759
#10 0x00001504919ccd0a in NEO::Drm::create (hwDeviceId=..., rootDeviceEnvironment=...) at /home/yeluo/opt/compute-runtime/shared/source/dll/linux/drm_neo_create.cpp:97
#11 0x0000150491eac52d in NEO::initDrmOsInterface (hwDeviceId=..., rootDeviceIndex=0, rootDeviceEnv=0x468f750) at /home/yeluo/opt/compute-runtime/shared/source/os_interface/linux/os_interface_linux.cpp:40
#12 0x0000150491e0c380 in NEO::RootDeviceEnvironment::initOsInterface (this=<optimized out>, hwDeviceId=..., rootDeviceIndex=rootDeviceIndex@entry=0)
    at /home/yeluo/opt/compute-runtime/shared/source/os_interface/init_os_interface_drm_or_wddm.cpp:17
#13 0x0000150491e0d4fa in NEO::initHwDeviceIdResources (rootDeviceIndex=0, hwDeviceId=..., executionEnvironment=...) at /home/yeluo/opt/compute-runtime/shared/source/os_interface/device_factory.cpp:140
#14 NEO::DeviceFactory::prepareDeviceEnvironments (executionEnvironment=...) at /home/yeluo/opt/compute-runtime/shared/source/os_interface/device_factory.cpp:172
#15 0x0000150491d61cd6 in NEO::prepareDeviceEnvironmentsImpl (executionEnvironment=...) at /home/yeluo/opt/compute-runtime/shared/source/command_stream/create_command_stream_impl.cpp:59
#16 0x00001504919cc09b in NEO::prepareDeviceEnvironments (executionEnvironment=..., osPciPath=..., rootDeviceIndex=rootDeviceIndex@entry=0)
    at /home/yeluo/opt/compute-runtime/shared/source/dll/get_devices.cpp:20
#17 0x00001504919cc228 in NEO::prepareDeviceEnvironments (executionEnvironment=...) at /home/yeluo/opt/compute-runtime/shared/source/dll/get_devices.cpp:53
#18 0x0000150491e0d819 in NEO::DeviceFactory::createDevices (executionEnvironment=...) at /home/yeluo/opt/compute-runtime/shared/source/os_interface/device_factory.cpp:225
#19 0x0000150491a171d5 in L0::DriverImp::initialize (this=<optimized out>, result=0x7ffcf53c6524) at /home/yeluo/opt/compute-runtime/level_zero/core/source/driver/driver.cpp:68
#20 0x0000150491a16a8e in operator() (__closure=<optimized out>) at /home/yeluo/opt/compute-runtime/level_zero/core/source/driver/driver.cpp:102
#21 std::__invoke_impl<void, L0::DriverImp::driverInit(ze_init_flags_t)::<lambda()> > (__f=...) at /opt/cray/pe/gcc/11.2.0/snos/include/g++/bits/invoke.h:61
#22 std::__invoke<L0::DriverImp::driverInit(ze_init_flags_t)::<lambda()> > (__fn=...) at /opt/cray/pe/gcc/11.2.0/snos/include/g++/bits/invoke.h:96
#23 operator() (__closure=<optimized out>) at /opt/cray/pe/gcc/11.2.0/snos/include/g++/mutex:776
#24 operator() (__closure=0x0) at /opt/cray/pe/gcc/11.2.0/snos/include/g++/mutex:712
#25 _FUN () at /opt/cray/pe/gcc/11.2.0/snos/include/g++/mutex:712
#26 0x000015049bb895a7 in __pthread_once_slow () from /lib64/libpthread.so.0
#27 0x0000150491a16d53 in __gthread_once (__func=<optimized out>, __once=0x15049334f088 <L0::driverImp+8>) at /opt/cray/pe/gcc/11.2.0/snos/include/g++/x86_64-suse-linux/bits/gthr-default.h:700
#28 std::call_once<L0::DriverImp::driverInit(ze_init_flags_t)::<lambda()> > (__f=..., __once=...) at /opt/cray/pe/gcc/11.2.0/snos/include/g++/mutex:783
#29 L0::DriverImp::driverInit (flags=<optimized out>, this=0x15049334f080 <L0::driverImp>) at /home/yeluo/opt/compute-runtime/level_zero/core/source/driver/driver.cpp:100
#30 L0::init (flags=<optimized out>) at /home/yeluo/opt/compute-runtime/level_zero/core/source/driver/driver.cpp:137
#31 0x000015049a3303fe in loader::context_t::init_driver(loader::driver_t, unsigned int) () from /home/yeluo/opt/packages/intel_compute_runtime/27191/lib64/libze_loader.so.1
#32 0x000015049a330487 in loader::context_t::check_drivers(unsigned int) () from /home/yeluo/opt/packages/intel_compute_runtime/27191/lib64/libze_loader.so.1
#33 0x000015049a32c589 in ?? () from /home/yeluo/opt/packages/intel_compute_runtime/27191/lib64/libze_loader.so.1
#34 0x000015049a3253dd in ?? () from /home/yeluo/opt/packages/intel_compute_runtime/27191/lib64/libze_loader.so.1
#35 0x000015049bb895a7 in __pthread_once_slow () from /lib64/libpthread.so.0
#36 0x000015049a32545d in zeInit () from /home/yeluo/opt/packages/intel_compute_runtime/27191/lib64/libze_loader.so.1
#37 0x00001504936fb9b7 in RTLDeviceInfoTy::findDevices (this=0x337c1f0)
    at /netbatch/donb00014_00/dir/workspace/NIT/xmain/LX/xmainefi2linux_release/ws/icsws/llvm/openmp/libomptarget/plugins/level0/src/rtl.cpp:6004
#38 0x0000150493711d38 in __tgt_rtl_number_of_devices () at /netbatch/donb00014_00/dir/workspace/NIT/xmain/LX/xmainefi2linux_release/ws/icsws/llvm/openmp/libomptarget/plugins/level0/src/rtl.cpp:7276
#39 0x000015049bebcf91 in RTLsTy::attemptLoadRTL (RTLName=..., RTL=...) at /netbatch/donb00014_00/dir/workspace/NIT/xmain/LX/xmainefi2linux_release/ws/icsws/llvm/openmp/libomptarget/src/rtl.cpp:442
#40 0x000015049bebc897 in RTLsTy::loadRTLs (this=0x1cac670) at /netbatch/donb00014_00/dir/workspace/NIT/xmain/LX/xmainefi2linux_release/ws/icsws/llvm/openmp/libomptarget/src/rtl.cpp:359
#41 0x000015049bb895a7 in __pthread_once_slow () from /lib64/libpthread.so.0
#42 0x000015049be9d12a in __gthread_once (__once=0x1cac6d0, __func=<optimized out>)
    at /rdrive/ref/gcc/7.5.0/rhel70/efi2/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/x86_64-linux-gnu/bits/gthr-default.h:699
#43 std::call_once<void (RTLsTy::*)(), RTLsTy*> (__once=..., __f=@0x7ffcf53c76b0: (void (RTLsTy::*)(RTLsTy * const)) 0x15049bebc4e0 <RTLsTy::loadRTLs()>, __args=@0x7ffcf53c76a8: 0x1cac670)
    at /rdrive/ref/gcc/7.5.0/rhel70/efi2/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/mutex:684
#44 __tgt_register_lib (Desc=0xcfeb50 <openmp_offloading[descriptor]>) at /netbatch/donb00014_00/dir/workspace/NIT/xmain/LX/xmainefi2linux_release/ws/icsws/llvm/openmp/libomptarget/src/interface.cpp:89
#45 0x0000000000c4b35d in __libc_csu_init (argc=3, argv=0x7ffcf53c7838, envp=0x7ffcf53c7858) at elf-init.c:88
#46 0x000015049b9b61dc in __libc_start_main () from /lib64/libc.so.6
#47 0x000000000045059a in _start () at ../sysdeps/x86_64/start.S:120

Any idea what I did wrong to build L0?

JablonskiMateusz commented 5 months ago

could you share log from uname -a command?

ye-luo commented 5 months ago
uname -a
Linux aurora-uan-0012 5.14.21-150400.24.55-default #1 SMP PREEMPT_DYNAMIC Mon Mar 27 15:25:48 UTC 2023 (cc75cf8) x86_64 x86_64 x86_64 GNU/Linux

The KMD is 736.25.

JablonskiMateusz commented 5 months ago

could you build runtime with custom cmake flag NEO_ENABLE_i915_PRELIM_DETECTION=1 same as we build package for release ?

ye-luo commented 5 months ago

It works. Many thanks.