intel / linux-npu-driver

Intel® NPU (Neural Processing Unit) Driver
MIT License
147 stars 15 forks source link

NPU UMD driver cause oneAPI sycl runtime crash #29

Open wistal opened 4 months ago

wistal commented 4 months ago

Without NPU UMD driver, oneAPI works OK, After install NPU UMD driver, oneAPI crash.

Test system Ubuntu 22.04 on MTL 165H Confirmed NPU KMD driver loaded OK

[ 2.150613] intel_vpu 0000:00:0b.0: enabling device (0000 -> 0002) [ 2.166532] intel_vpu 0000:00:0b.0: [drm] Firmware: intel/vpu/vpu_37xx_v0.0.bin, version: 20240221MTL_CLIENT_SILICON-release2101ci_tag_ud202408_vpu_rc_20240221_2101845c105994a [ 2.290232] [drm] Initialized intel_vpu 1.0.0 20230117 for 0000:00:0b.0 on minor 0

=============================== Installed oneAPI 2024.0 with iGPU driver, without NPU UMD driver

. intel/oneapi/setvars.sh

:: initializing oneAPI environment ... -bash: BASH_VERSION = 5.1.16(1)-release args: Using "$@" for setvars.sh arguments: :: advisor -- latest :: ccl -- latest :: compiler -- latest :: dal -- latest :: debugger -- latest :: dev-utilities -- latest :: dnnl -- latest :: dpcpp-ct -- latest :: dpl -- latest :: ipp -- latest :: ippcp -- latest :: mkl -- latest :: mpi -- latest :: tbb -- latest :: vtune -- latest :: oneAPI environment initialized ::

~$ sycl-ls

[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.12.0.12_195853.xmain-hotfix] [opencl:cpu:1] Intel(R) OpenCL, Intel(R) Core(TM) Ultra 7 165H OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix] [opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) Graphics OpenCL 3.0 NEO [24.13.29138.7] [ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) Graphics 1.3 [1.3.29138]


Everything OK

After install NPU UMD driver:

~/npu/1.20$ ls intel-driver-compiler-npu_1.2.0.20240404-8553879914_ubuntu22.04_amd64.deb intel-fw-npu_1.2.0.20240404-8553879914_ubuntu22.04_amd64.deb intel-level-zero-npu_1.2.0.20240404-8553879914_ubuntu22.04_amd64.deb npu.sh ~/npu/1.20$ sudo dpkg -i *.deb Selecting previously unselected package intel-driver-compiler-npu. (Reading database ... 209200 files and directories currently installed.) Preparing to unpack intel-driver-compiler-npu_1.2.0.20240404-8553879914_ubuntu22.04_amd64.deb ... Unpacking intel-driver-compiler-npu (1.2.0.20240404-8553879914) ... Selecting previously unselected package intel-fw-npu. Preparing to unpack intel-fw-npu_1.2.0.20240404-8553879914_ubuntu22.04_amd64.deb ... Unpacking intel-fw-npu (1.2.0.20240404-8553879914) ... Selecting previously unselected package intel-level-zero-npu. Preparing to unpack intel-level-zero-npu_1.2.0.20240404-8553879914_ubuntu22.04_amd64.deb ... Unpacking intel-level-zero-npu (1.2.0.20240404-8553879914) ... Setting up intel-driver-compiler-npu (1.2.0.20240404-8553879914) ... Setting up intel-fw-npu (1.2.0.20240404-8553879914) ... Setting up intel-level-zero-npu (1.2.0.20240404-8553879914) ... Processing triggers for libc-bin (2.35-0ubuntu3.1)


Check oneAPI sycl device again

~$ sycl-ls SYCL Exception encountered: Native API failed. Native API returns: -30 (PI_ERROR_INVALID_VALUE) -30 (PI_ERROR_INVALID_VALUE)


Error happened! NPU driver caused it crash

check with strace log: ~$ strace sycl-ls

... skipped logs... futex(0x746c17666178, FUTEX_WAKE_PRIVATE, 2147483647) = 0 openat(AT_FDCWD, "/dev/accel/accel0", O_RDWR|O_CLOEXEC) = 4 newfstatat(4, "", {st_mode=S_IFCHR|0660, st_rdev=makedev(0x105, 0), ...}, AT_EMPTY_PATH) = 0 ioctl(4, DRM_IOCTL_VERSION, 0x7ffc772356c0) = 0 ioctl(4, DRM_IOCTL_VERSION, 0x7ffc772356c0) = 0 ioctl(4, DRM_IOCTL_ETNAVIV_GET_PARAM or DRM_IOCTL_EXYNOS_GEM_CREATE or DRM_IOCTL_LIMA_GET_PARAM or DRM_IOCTL_MSM_GET_PARAM or DRM_IOCTL_OMAP_GET_PARAM or DRM_IOCTL_TEGRA_GEM_CREATE, 0x7ffc772355a0) = 0 ioctl(4, DRM_IOCTL_ETNAVIV_GET_PARAM or DRM_IOCTL_EXYNOS_GEM_CREATE or DRM_IOCTL_LIMA_GET_PARAM or DRM_IOCTL_MSM_GET_PARAM or ...skipped logs... DRM_IOCTL_OMAP_GET_PARAM or DRM_IOCTL_TEGRA_GEM_CREATE, 0x7ffc77235560) = 0 ioctl(4, DRM_IOCTL_ETNAVIV_GET_PARAM or DRM_IOCTL_EXYNOS_GEM_CREATE or DRM_IOCTL_LIMA_GET_PARAM or DRM_IOCTL_MSM_GET_PARAM or DRM_IOCTL_OMAP_GET_PARAM or DRM_IOCTL_TEGRA_GEM_CREATE, 0x7ffc77235560) = 0 close(4) = 0 openat(AT_FDCWD, "/dev/accel/accel1", O_RDWR|O_CLOEXEC) = -1 ENOENT (No such file or directory) ... skipped logs ... openat(AT_FDCWD, "/dev/accel/accel61", O_RDWR|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/dev/accel/accel62", O_RDWR|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/dev/accel/accel63", O_RDWR|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/home/user/intel/oneapi/tbb/2021.11/env/../lib/intel64/gcc4.8/libvpux_driver_compiler.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/home/user/intel/oneapi/mpi/2021.11/opt/mpi/libfabric/lib/libvpux_driver_compiler.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/home/user/intel/oneapi/mpi/2021.11/lib/libvpux_driver_compiler.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/home/user/intel/oneapi/mkl/2024.0/lib/libvpux_driver_compiler.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/home/user/intel/oneapi/ippcp/2021.9/lib/libvpux_driver_compiler.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/home/user/intel/oneapi/ipp/2021.10/lib/libvpux_driver_compiler.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/home/user/intel/oneapi/dpl/2022.3/lib/libvpux_driver_compiler.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/home/user/intel/oneapi/dnnl/2024.0/lib/libvpux_driver_compiler.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/home/user/intel/oneapi/debugger/2024.0/opt/debugger/lib/libvpux_driver_compiler.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/home/user/intel/oneapi/dal/2024.0/lib/libvpux_driver_compiler.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/home/user/intel/oneapi/compiler/2024.0/opt/oclfpga/host/linux64/lib/libvpux_driver_compiler.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/home/user/intel/oneapi/compiler/2024.0/opt/compiler/lib/libvpux_driver_compiler.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/home/user/intel/oneapi/compiler/2024.0/lib/libvpux_driver_compiler.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/home/user/intel/oneapi/ccl/2021.11/lib/libvpux_driver_compiler.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 4 newfstatat(4, "", {st_mode=S_IFREG|0644, st_size=56923, ...}, AT_EMPTY_PATH) = 0 mmap(NULL, 56923, PROT_READ, MAP_PRIVATE, 4, 0) = 0x746c34043000 close(4) = 0 openat(AT_FDCWD, "/lib/libvpux_driver_compiler.so", O_RDONLY|O_CLOEXEC) = 4 read(4, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832 newfstatat(4, "", {st_mode=S_IFREG|0644, st_size=72440896, ...}, AT_EMPTY_PATH) = 0 mmap(NULL, 73455568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 4, 0) = 0x746bf7800000 mprotect(0x746bf7a59000, 69115904, PROT_NONE) = 0 mmap(0x746bf7a59000, 51245056, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 4, 0x259000) = 0x746bf7a59000 mmap(0x746bfab38000, 17866752, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 4, 0x3338000) = 0x746bfab38000 mmap(0x746bfbc43000, 868352, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 4, 0x4442000) = 0x746bfbc43000 mmap(0x746bfbd17000, 1009616, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x746bfbd17000 close(4) = 0 mprotect(0x746bfbc43000, 815104, PROT_READ) = 0 brk(0x5b437800a000) = 0x5b437800a000 brk(0x5b437802b000) = 0x5b437802b000

...skipped logs...

futex(0x5b43789d5b90, FUTEX_WAKE_PRIVATE, 2147483647) = 0 futex(0x746c34f73210, FUTEX_WAKE_PRIVATE, 2147483647) = 0 write(2, "SYCL Exception encountered: ", 28SYCL Exception encountered: ) = 28 write(2, "Native API failed. Native API re"..., 96Native API failed. Native API returns: -30 (PI_ERROR_INVALID_VALUE) -30 (PI_ERROR_INVALID_VALUE)) = 96

===================================================== Check with OpenVINO: ~$ benchmark_app -h | grep Avail Available target devices: CPU GPU NPU NPU can works with OpenVINO normally

kpradzyn commented 4 months ago

Hey @wistal

This is known issue with L0 API design. In general, to fix this we need: 1) L0 API change. Here is proposal: https://github.com/oneapi-src/level-zero-spec/issues/298 (still open status) 2) sycl-ls update to sync upper L0 API changes.

wistal commented 3 months ago

Thanks for update