Open fcharras opened 2 years ago
Hi @fcharras,
when a device is not found, the app exit with a segfault rather than a clean exit with an error code.
Yeah, that is a result of missing error handling in our simple-app
example: in SYCL errors are reported through exceptions and since there are no "global" try..catch
block in the sample, it means that all errors will cause crashes due to unhandled exceptions.
I'm not sure what is wrong with the level_zero backend here. Is it at build time (is it mandatory to pass the l0 headers flag to get level_zero support ?), an issue with the level_zero runtime, or could it be that my device is not compatible ? how can I figure out ?
There are several layers where the problem could occur: it could be that L0 PI plugin (an interface between SYCL runtime and L0 runtime) is not built/found/properly loaded; it could be that the L0 itself can't be loaded, etc.
For starters, I would suggest to launch your app with SYCL_PI_TRACE=1
env variable set, it will print the info about loaded plugins in the following form:
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_opencl.so [ PluginVersion: 10.12.1 ]
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_level_zero.so [ PluginVersion: 10.12.1 ]
Depending on the output, we should be able to better understand where to proceed with further investigation.
is it mandatory to pass the l0 headers flag to get level_zero support ?
L0 plugin should be built by default. If you don't specify both L0_INCLUDE_DIR
and L0_LIBRARY
, then our cmake script should automatically download them from corresponding github repos.
Hello @AlexeySachkov thank you for the support
At this day I'm still unable to run the level_zero
backend and I've moved forward using the opencl
backend. I'm still interested to use the level_zero
backend to compare performances to opencl
and I'll gladly expose more information if you need.
I've been working from inside a docker container built on ghcr.io/intel/llvm/ubuntu2004_intel_drivers, the full dockerfile can be found here: https://github.com/soda-inria/sklearn-numba-dpex/blob/main/docker/Dockerfile and the image is available to pull at jjerphan/numba_dpex_dev:latest , must be ran with --device=/dev/dri
to enable gpu passthrough.
Other users of this container report the same issue about missing level_zero
backend within the container accross several different computers.
The SYCL_PI_TRACE
only show that opencl
is loaded, either gpu or cpu depending on the filter
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_opencl.so
SYCL_PI_TRACE[all]: Selected device ->
SYCL_PI_TRACE[all]: platform: Intel(R) OpenCL
SYCL_PI_TRACE[all]: device: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
SYCL_PI_TRACE[all]: Selected device ->
SYCL_PI_TRACE[all]: platform: Intel(R) OpenCL HD Graphics
SYCL_PI_TRACE[all]: device: Intel(R) UHD Graphics [0x9a60]
Querying for level_zero
backend just returns no device (using dpctl
:
File "dpctl/_sycl_device_factory.pyx", line 359, in dpctl._sycl_device_factory.select_default_device
dpctl._sycl_device.SyclDeviceCreationError: Default device is unavailable.
)
I do have the level_zero
libraries and headers installed in the container:
root@6d3f8d1418a6:/# locate level_zero
/opt/intel/oneapi/compiler/2022.1.0/linux/include/sycl/CL/sycl/backend/level_zero.hpp
/opt/intel/oneapi/compiler/2022.1.0/linux/include/sycl/CL/sycl/detail/backend_traits_level_zero.hpp
/opt/intel/oneapi/compiler/2022.1.0/linux/include/sycl/ext/oneapi/backend/level_zero.hpp
/opt/intel/oneapi/compiler/2022.1.0/linux/include/sycl/ext/oneapi/backend/level_zero_ownership.hpp
/opt/intel/oneapi/compiler/2022.1.0/linux/lib/libpi_level_zero.so
/opt/venv/lib/libpi_level_zero.so
in the same folder than opencl
library:
/opt/intel/oneapi/compiler/2022.1.0/linux/lib/libpi_opencl.so
/opt/venv/lib/libpi_opencl.so
with /opt/venv/lib
being in the LD_LIBRARY_PATH
.
General informations on my current host system:
0000:00:02.0 VGA compatible controller: Intel Corporation TigerLake-H GT1 [UHD Graphics] (rev 01) (prog-if 00 [VGA controller])
@fcharras, so, about SYCL_PI_TRACE
: I misguided you a bit. Could you please launch your app under SYCL_PI_TRACE=-1
without setting SYCL_DEVICE_FILTER
env variable?
Trace level -1 should also print libraries which it attempted to load, but failed. Could you please also try ldd
on that libpi_level_zero.so
to see if it has any unresolved dependencies maybe?
Here's the stderr with SYCL_PI_TRACE=-1
:
...
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_opencl.so
SYCL_PI_TRACE[-1]: dlopen(libpi_level_zero.so) failed with <libze_loader.so.1: cannot open shared object file: No such file or directory>
SYCL_PI_TRACE[all]: Check if plugin is present. Failed to load plugin: libpi_level_zero.so
SYCL_PI_TRACE[-1]: dlopen(libpi_cuda.so) failed with <libpi_cuda.so: cannot open shared object file: No such file or directory>
SYCL_PI_TRACE[all]: Check if plugin is present. Failed to load plugin: libpi_cuda.so
SYCL_PI_TRACE[-1]: dlopen(libpi_hip.so) failed with <libpi_hip.so: cannot open shared object file: No such file or directory>
SYCL_PI_TRACE[all]: Check if plugin is present. Failed to load plugin: libpi_hip.so
SYCL_PI_TRACE[-1]: dlopen(libpi_esimd_emulator.so) failed with <libpi_esimd_emulator.so: cannot open shared object file: No such file or directory>
SYCL_PI_TRACE[all]: Check if plugin is present. Failed to load plugin: libpi_esimd_emulator.so
...
and indeed it turns out libze_loader.so.1
is not found:
root@6d3f8d1418a6:/opt/venv/lib# ldd libpi_level_zero.so
linux-vdso.so.1 (0x00007fff73bdf000)
libze_loader.so.1 => not found
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f6ef69f8000)
libsvml.so => /opt/venv/lib/libsvml.so (0x00007f6ef4996000)
libirng.so => /opt/venv/lib/libirng.so (0x00007f6ef462c000)
libimf.so => /opt/venv/lib/libimf.so (0x00007f6ef3f9e000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f6ef3e4f000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f6ef3e34000)
libintlc.so.5 => /opt/venv/lib/libintlc.so.5 (0x00007f6ef3bbc000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f6ef3bb6000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6ef39c4000)
/lib64/ld-linux-x86-64.so.2 (0x00007f6ef6b24000)
here's what does exist in the container refering to libze
:
root@6d3f8d1418a6:/opt/venv/lib# locate libze
/usr/lib/x86_64-linux-gnu/libze_intel_gpu.so.1
/usr/lib/x86_64-linux-gnu/libze_intel_gpu.so.1.3.23599
/var/lib/libze_intel_gpu
/var/lib/libze_intel_gpu/pci_bind_status_file
/var/lib/libze_intel_gpu/wedged_file
This is probably the culprit ! how should I install this library, is there a reason it's not pre-installed in the base ghcr.io/intel/llvm/ubuntu2004_intel_drivers
image ?
This is probably the culprit ! how should I install this library, is there a reason it's not pre-installed in the base
ghcr.io/intel/llvm/ubuntu2004_intel_drivers
image ?
In the meantime, installing https://github.com/oneapi-src/level-zero/releases/tag/v1.8.5 does indeed enable the level_zero
backend !
is there a reason it's not pre-installed in the base
ghcr.io/intel/llvm/ubuntu2004_intel_drivers
image?
It is most likely happened due to human error. My understanding is that those images were provided on a voluntary basis by one of our colleagues and I'm not sure that we test them thoroughly, so here we are.
how should I install this library
This is probably the culprit ! how should I install this library, is there a reason it's not pre-installed in the base ghcr.io/intel/llvm/ubuntu2004_intel_drivers image ?
In the meantime, installing https://github.com/oneapi-src/level-zero/releases/tag/v1.8.5 does indeed enable the level_zero backend !
Glad you figured it out! I was just going to paste the same link
is there a reason it's not pre-installed in the base ghcr.io/intel/llvm/ubuntu2004_intel_drivers image?
It is most likely happened due to human error. My understanding is that those images were provided on a voluntary basis by one of our colleagues and I'm not sure that we test them thoroughly, so here we are.
Ok, I was wrong here, we actually have a workflow which automatically updates those docker images here: https://github.com/intel/llvm/actions/workflows/sycl_containers.yaml
From what I see, we simply install an intel/compute-runtime release such as 22.31.23852, for example. According to the release description, level zero libraries should be there, but apparently something went wrong
Hi! There have been no updates for at least the last 60 days, though the ticket has assignee(s).
@AlexeySachkov, could I ask you to take one of the following actions? :)
Thanks!
Hi! There have been no updates for at least the last 60 days, though the issue has assignee(s).
@AlexeySachkov, could you please take one of the following actions:
Thanks!
Hi! There have been no updates for at least the last 60 days, though the issue has assignee(s).
@AlexeySachkov, could you please take one of the following actions:
Thanks!
Hi! There have been no updates for at least the last 60 days, though the issue has assignee(s).
@AlexeySachkov, could you please take one of the following actions:
Thanks!
Describe the bug After following the getting started guide and other connected documentation I can build the
simple-sycl-app.exe
, it runs fine but not with the level_zero backend:Several issues to report:
To Reproduce
I've built the sycl branch following instructions in https://intel.github.io/llvm-docs/GetStartedGuide.html#install-low-level-runtime from within the ubuntu2004_base container, then compile the simple-sycl-app.exe with
clang++ -fsycl simple-sycl-app.cpp -o simple-sycl-app.exe
.I've ran the app in various environment:
/install_drivers.sh
script, with all tags set tolatest
with the same outcome. Also tried the dpctl python packages,
gpctl.get_devices()
would indeed show opencl gpu backend but notlevel_zero
backend.Not sure if I should look towards a hardware compatibility issue with level_zero here or with the intel runtime, or how to gather more information ?
Environment (please complete the following information):