intel / linux-npu-driver

Intel® NPU (Neural Processing Unit) Driver
MIT License
123 stars 16 forks source link

Driver crash if no VPU devices or VPU device cannot access #26

Open yipengqu opened 3 months ago

yipengqu commented 3 months ago

logs without debug using hello_query device app to check device: ./hello_query_device [ INFO ] Build ................................. 2024.0.0-14509-34caeefd078-releases/2024/0 [ INFO ] Segmentation fault (core dumped)

gdb logs with debug enabled: Thread 1 "hello_query_dev" received signal SIGSEGV, Segmentation fault. 0x00007ffff05eb2fd in loader::zeDriverGetProperties (hDriver=0x0, pDriverProperties=0x7fffffffb820) at /home/bld/work/third_party/level-zero/source/loader/ze_ldrddi.cpp:161 161 /home/bld/work/third_party/level-zero/source/loader/ze_ldrddi.cpp: No such file or directory. (gdb) bt

0 0x00007ffff05eb2fd in loader::zeDriverGetProperties (hDriver=0x0, pDriverProperties=0x7fffffffb820) at /home/bld/work/third_party/level-zero/source/loader/ze_ldrddi.cpp:161

1 0x00007ffff05cd155 in zeDriverGetProperties (hDriver=0x0, pDriverProperties=0x7fffffffb820) at /home/bld/work/third_party/level-zero/source/lib/ze_libapi.cpp:209

2 0x00007ffff0714427 in ?? () from /home/sos/ov/openvino_2024.0/runtime/lib/intel64/libopenvino_intel_npu_plugin.so

3 0x00007ffff07abc0a in ?? () from /home/sos/ov/openvino_2024.0/runtime/lib/intel64/libopenvino_intel_npu_plugin.so

4 0x00007ffff07adb20 in create_plugin_engine () from /home/sos/ov/openvino_2024.0/runtime/lib/intel64/libopenvino_intel_npu_plugin.so

5 0x00007ffff7633ee0 in ?? () from /home/sos/ov/openvino_2024.0/runtime/lib/intel64/libopenvino.so.2400

6 0x00007ffff763c023 in ?? () from /home/sos/ov/openvino_2024.0/runtime/lib/intel64/libopenvino.so.2400

7 0x00007ffff7642ca8 in ?? () from /home/sos/ov/openvino_2024.0/runtime/lib/intel64/libopenvino.so.2400

8 0x00007ffff761db66 in ov::Core::get_available_devices[abi:cxx11]() const () from /home/sos/ov/openvino_2024.0/runtime/lib/intel64/libopenvino.so.2400

9 0x0000555555558ae1 in main ()

strace_log.txt

jwludzik commented 3 months ago

There is an issue in driver that returns success on zeInit when no devices has been found. The GPU driver returns error in such case. The code part responsible for it: https://github.com/intel/linux-npu-driver/blob/4bcbf2abe94eb4d9c083bd616b58e309a82d008a/umd/level_zero_driver/core/source/driver/driver.cpp#L68-L76

I will try to prepare a fix for it. @yipengqu , is it a blocker for you?

yipengqu commented 3 months ago

It's not a blocker, thanks.

jwludzik commented 1 month ago

The issue should be fixed in https://github.com/intel/linux-npu-driver/releases/tag/v1.5.0