intel / intel-extension-for-openxla

Apache License 2.0
39 stars 11 forks source link

ZE_RESULT_ERROR_DEPENDENCY_UNAVAILABLE error while running the test program #37

Open shibdas opened 1 week ago

shibdas commented 1 week ago

Hi, I'm running the test program from the readme after compiling and installing the extension but I'm hitting a segfault after a level zero error. I turned on various debug env variable and it seems level zero is loaded from the GPU but still while calling zeModulecreate in

https://github.com/intel/intel-extension-for-openxla/blob/main/xla/stream_executor/sycl/sycl_driver.cc#L392

python test.py DEBUG:jax._src.xla_bridge:Discovered path based JAX plugin: jax_plugins.intel_extension_for_openxla DEBUG:jax._src.xla_bridge:Loading plugin module jax_plugins.intel_extension_for_openxla WARNING:jax_plugins.intel_extension_for_openxla:INFO: Intel Extension for OpenXLA version: 0.4.0, commit: eb3d812a DEBUG:jax._src.xla_bridge:registering PJRT plugin xpu from /home/test/.local/lib/python3.10/site-packages/jax_plugins/intel_extension_for_openxla/pjrt_plugin_xpu.so DEBUG:jax._src.xla_bridge:Initializing backend 'cpu' DEBUG:jax._src.xla_bridge:Backend 'cpu' initialized DEBUG:jax._src.xla_bridge:Initializing backend 'cuda' INFO:jax._src.xla_bridge:Unable to initialize backend 'cuda': DEBUG:jax._src.xla_bridge:Initializing backend 'rocm' INFO:jax._src.xla_bridge:Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig' DEBUG:jax._src.xla_bridge:Initializing backend 'tpu' INFO:jax._src.xla_bridge:Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory WARNING:jax._src.xla_bridge:Platform 'xpu' is experimental and not all JAX functionality may be correctly supported! DEBUG:jax._src.xla_bridge:Initializing backend 'xpu' KHR ICD trace at /netbatch/donb66987_00/dir/workspace/NIT/xmain-rel/LX/xmainefi2linux_release/ws/icsws/builds/xmainefi2linux_pgouse_sreleaseusingrelease/llvm/_deps/ocl-icd-src/loader/icd.c:362: Found OCL_ICD_FILENAMES environment variable. KHR ICD trace at /netbatch/donb66987_00/dir/workspace/NIT/xmain-rel/LX/xmainefi2linux_release/ws/icsws/builds/xmainefi2linux_pgouse_sreleaseusingrelease/llvm/_deps/ocl-icd-src/loader/icd.c:71: attempting to add vendor /opt/intel/oneapi/compiler/2024.2/lib/libintelocl.so... KHR ICD trace at /netbatch/donb66987_00/dir/workspace/NIT/xmain-rel/LX/xmainefi2linux_release/ws/icsws/builds/xmainefi2linux_pgouse_sreleaseusingrelease/llvm/_deps/ocl-icd-src/loader/icd.c:197: successfully added vendor /opt/intel/oneapi/compiler/2024.2/lib/libintelocl.so with suffix INTEL KHR ICD trace at /netbatch/donb66987_00/dir/workspace/NIT/xmain-rel/LX/xmainefi2linux_release/ws/icsws/builds/xmainefi2linux_pgouse_sreleaseusingrelease/llvm/_deps/ocl-icd-src/loader/icd.c:71: attempting to add vendor /usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so... KHR ICD trace at /netbatch/donb66987_00/dir/workspace/NIT/xmain-rel/LX/xmainefi2linux_release/ws/icsws/builds/xmainefi2linux_pgouse_sreleaseusingrelease/llvm/_deps/ocl-icd-src/loader/icd.c:197: successfully added vendor /usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so with suffix INTEL KHR ICD trace at /netbatch/donb66987_00/dir/workspace/NIT/xmain-rel/LX/xmainefi2linux_release/ws/icsws/builds/xmainefi2linux_pgouse_sreleaseusingrelease/llvm/_deps/ocl-icd-src/loader/icd.c:71: attempting to add vendor /opt/intel//oneapi/compiler/latest/lib/libintelocl.so... KHR ICD trace at /netbatch/donb66987_00/dir/workspace/NIT/xmain-rel/LX/xmainefi2linux_release/ws/icsws/builds/xmainefi2linux_pgouse_sreleaseusingrelease/llvm/_deps/ocl-icd-src/loader/icd.c:86: already loaded vendor /opt/intel//oneapi/compiler/latest/lib/libintelocl.so, nothing to do here KHR ICD trace at /netbatch/donb66987_00/dir/workspace/NIT/xmain-rel/LX/xmainefi2linux_release/ws/icsws/builds/xmainefi2linux_pgouse_sreleaseusingrelease/llvm/_deps/ocl-icd-src/loader/linux/icd_linux.c:150: Failed to open path /etc/OpenCL/layers, continuing ZE_LOADER_DEBUG_TRACE:Using Loader Library Path: ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_gpu.so.1 ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_gpu_legacy1.so.1 ZE_LOADER_DEBUG_TRACE:Load Library of libze_intel_gpu_legacy1.so.1 failed with libze_intel_gpu_legacy1.so.1: cannot open shared object file: No such file or directory ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_vpu.so.1 ZE_LOADER_DEBUG_TRACE:Load Library of libze_intel_vpu.so.1 failed with libze_intel_vpu.so.1: cannot open shared object file: No such file or directory ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_npu.so.1 ZE_LOADER_DEBUG_TRACE:Load Library of libze_intel_npu.so.1 failed with libze_intel_npu.so.1: cannot open shared object file: No such file or directory ZE_LOADER_DEBUG_TRACE:Tracing Layer Library Path: libze_tracing_layer.so.1 ZE_LOADER_DEBUG_TRACE:check_drivers(flags=ZE_INIT_FLAG_GPU_ONLY) ZE_LOADER_DEBUG_TRACE:init driver libze_intel_gpu.so.1 zeInit(ZE_INIT_FLAG_GPU_ONLY) returning ZE_RESULT_SUCCESS DEBUG:jax._src.xla_bridge:Backend 'xpu' initialized jax.local_devices(): [xpu(id=0)] DEBUG:jax._src.dispatch:Finished tracing + transforming for pjit in 0.00041031837463378906 sec DEBUG:jax._src.dispatch:Finished tracing + transforming for pjit in 0.00033354759216308594 sec DEBUG:jax._src.dispatch:Finished tracing + transforming fn for pjit in 0.00041556358337402344 sec DEBUG:jax._src.dispatch:Finished tracing + transforming fn for pjit in 0.00045800209045410156 sec DEBUG:jax._src.dispatch:Finished tracing + transforming _uniform for pjit in 0.004739522933959961 sec DEBUG:jax._src.dispatch:Finished tracing + transforming for pjit in 0.0003650188446044922 sec DEBUG:jax._src.dispatch:Finished tracing + transforming fn for pjit in 0.0003840923309326172 sec DEBUG:jax._src.dispatch:Finished tracing + transforming fn for pjit in 0.00032138824462890625 sec DEBUG:jax._src.dispatch:Finished tracing + transforming _uniform for pjit in 0.0030584335327148438 sec DEBUG:jax._src.dispatch:Finished tracing + transforming for pjit in 0.0004353523254394531 sec DEBUG:jax._src.dispatch:Finished tracing + transforming fn for pjit in 0.0003151893615722656 sec DEBUG:jax._src.dispatch:Finished tracing + transforming fn for pjit in 0.000331878662109375 sec DEBUG:jax._src.dispatch:Finished tracing + transforming _uniform for pjit in 0.0031616687774658203 sec DEBUG:jax._src.dispatch:Finished tracing + transforming for pjit in 0.00046443939208984375 sec DEBUG:jax._src.dispatch:Finished tracing + transforming relu for pjit in 0.0011076927185058594 sec DEBUG:jax._src.dispatch:Finished tracing + transforming fn for pjit in 0.000347137451171875 sec DEBUG:jax._src.dispatch:Finished tracing + transforming lax_conv for pjit in 0.016260147094726562 sec DEBUG:jax._src.interpreters.pxla:Compiling lax_conv for with global shapes and types []. Argument mapping: []. DEBUG:jax._src.dispatch:Finished tracing + transforming for pjit in 0.0004506111145019531 sec DEBUG:jax._src.dispatch:Finished tracing + transforming _threefry_seed for pjit in 0.0017113685607910156 sec DEBUG:jax._src.dispatch:Finished tracing + transforming ravel for pjit in 0.00023245811462402344 sec DEBUG:jax._src.dispatch:Finished tracing + transforming threefry_2x32 for pjit in 0.001512765884399414 sec DEBUG:jax._src.dispatch:Finished tracing + transforming _threefry_random_bits_original for pjit in 0.002304553985595703 sec DEBUG:jax._src.dispatch:Finished tracing + transforming for pjit in 0.0003342628479003906 sec DEBUG:jax._src.dispatch:Finished tracing + transforming fn for pjit in 0.00028896331787109375 sec DEBUG:jax._src.dispatch:Finished tracing + transforming fn for pjit in 0.00030732154846191406 sec DEBUG:jax._src.dispatch:Finished tracing + transforming for pjit in 0.0002868175506591797 sec DEBUG:jax._src.dispatch:Finished tracing + transforming for pjit in 0.00042128562927246094 sec DEBUG:jax._src.dispatch:Finished tracing + transforming ravel for pjit in 0.00017571449279785156 sec DEBUG:jax._src.dispatch:Finished tracing + transforming threefry_2x32 for pjit in 0.0012390613555908203 sec DEBUG:jax._src.dispatch:Finished tracing + transforming _threefry_random_bits_original for pjit in 0.001980304718017578 sec DEBUG:jax._src.dispatch:Finished tracing + transforming fn for pjit in 0.0004525184631347656 sec DEBUG:jax._src.dispatch:Finished tracing + transforming fn for pjit in 0.0002968311309814453 sec DEBUG:jax._src.dispatch:Finished tracing + transforming for pjit in 0.0002846717834472656 sec DEBUG:jax._src.dispatch:Finished tracing + transforming for pjit in 0.0003006458282470703 sec DEBUG:jax._src.dispatch:Finished tracing + transforming ravel for pjit in 0.0002028942108154297 sec DEBUG:jax._src.dispatch:Finished tracing + transforming threefry_2x32 for pjit in 0.0018062591552734375 sec DEBUG:jax._src.dispatch:Finished tracing + transforming _threefry_random_bits_original for pjit in 0.002747058868408203 sec DEBUG:jax._src.dispatch:Finished tracing + transforming fn for pjit in 0.000431060791015625 sec DEBUG:jax._src.dispatch:Finished tracing + transforming fn for pjit in 0.0003037452697753906 sec DEBUG:jax._src.dispatch:Finished tracing + transforming for pjit in 0.0003323554992675781 sec DEBUG:jax._src.dispatch:Finished tracing + transforming for pjit in 0.0002956390380859375 sec DEBUG:jax._src.dispatch:Finished jaxpr to MLIR module conversion jit(lax_conv) in 0.1102445125579834 sec DEBUG:jax._src.compiler:get_compile_options: num_replicas=1 num_partitions=1 device_assignment=[[xpu(id=0)]] DEBUG:jax._src.compiler:get_compile_options XLA-AutoFDO profile: using XLA-AutoFDO profile version -1 DEBUG:jax._src.dispatch:Finished XLA compilation of jit(lax_conv) in 0.5001187324523926 sec 2024-11-08 10:21:43.152872: F xla/stream_executor/sycl/sycl_driver.cc:402] L0 error 1879179264: Aborted (core dumped)

clinfo -l

Platform #0: Intel(R) OpenCL -- Device #0: Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz Platform #1: Intel(R) OpenCL Graphics -- Device #0: Intel(R) Data Center GPU Flex 140

AFAICT I have installed all the packages for the intel flex GPUs and I have oneapi 2024.2 installed. Can anyone please help with this? Thanks!