Closed teogi closed 1 year ago
@teogi do you use TensorFlow 2.12? If you use ITEX 1.1 release, please use TensorFlow 2.11, we will release ITEX with TensorFlow 2.12 soon, or you can use latest master for TensorFlow 2.12 support.
BTW, do you install the driver inside Ubuntu? https://github.com/intel/intel-extension-for-tensorflow/blob/main/docs/install/experimental/install_for_arc_gpu.md#native-linux-running-directly-on-hardware
I do try on the native Ubuntu, but there is some problem with the version of Linux. I didn't do it on WSL2, since the readme file didn't suggest to do that?
@teogi do you use TensorFlow 2.12? If you use ITEX 1.1 release, please use TensorFlow 2.11, we will release ITEX with TensorFlow 2.12 soon, or you can use latest master for TensorFlow 2.12 support.
BTW, do you install the driver inside Ubuntu? https://github.com/intel/intel-extension-for-tensorflow/blob/main/docs/install/experimental/install_for_arc_gpu.md#native-linux-running-directly-on-hardware
btw, the downgrade to the TensorFlow 2.11 did not solve the problem unfortunately.
It still requires something like libcudart.so
.
$ python -c "import intel_extension_for_tensorflow as itex; print(itex.__version__)"
2023-03-31 08:46:09.063860: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-31 08:46:09.256775: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-03-31 08:46:09.295236: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-03-31 08:46:09.295279: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-03-31 08:46:10.127459: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-03-31 08:46:10.127542: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2023-03-31 08:46:10.127566: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/teogi/miniconda3/envs/intel-gpu-ml/lib/python3.9/site-packages/intel_extension_for_tensorflow/__init__.py", line 17, in <module>
import tensorflow # pylint: disable=unused-import
File "/home/teogi/miniconda3/envs/intel-gpu-ml/lib/python3.9/site-packages/tensorflow/__init__.py", line 440, in <module>
_ll.load_library(_plugin_dir)
File "/home/teogi/miniconda3/envs/intel-gpu-ml/lib/python3.9/site-packages/tensorflow/python/framework/load_library.py", line 151, in load_library
py_tf.TF_LoadLibrary(lib)
tensorflow.python.framework.errors_impl.NotFoundError: libmkl_sycl.so.3: cannot open shared object file: No such file or directory
@teogi
TF always try to load libcudart.so
. It's default behavior and not impact the ITEX.
Please ignore it.
The error in your case is libmkl_sycl.so.3: cannot open shared object file
.
a. Please install oneMKL as the guide in install_for_arc_gpu.md.
like: sudo apt-get install intel-oneapi-runtime-dpcpp-cpp intel-oneapi-runtime-mkl
.
b. Please enable oneMKL before run ITEX:
source /opt/intel/oneapi/setvars.sh
.
refer to https://github.com/intel/intel-extension-for-tensorflow/blob/main/docs/install/experimental/install_for_arc_gpu.md#setup-environment-variables
If you still have any error, please run following cmd to check and share the log to us:
(tf)$ bash /path to site-packages/intel_extension_for_tensorflow/tools/env_check.sh
Now things getting more complicated on running the quick_example.py
.
python quick_example.py
2023-03-31 10:24:10.021211: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-31 10:24:10.105860: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-03-31 10:24:10.108078: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/intel/oneapi/vpl/2023.0.0/lib:/opt/intel/oneapi/tbb/2021.8.0/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/mpi/2021.8.0//libfabric/lib:/opt/intel/oneapi/mpi/2021.8.0//lib/release:/opt/intel/oneapi/mpi/2021.8.0//lib:/opt/intel/oneapi/mkl/2023.0.0/lib/intel64:/opt/intel/oneapi/ippcp/2021.6.3/lib/intel64:/opt/intel/oneapi/ipp/2021.7.0/lib/intel64:/opt/intel/oneapi/dnnl/2023.0.0/cpu_dpcpp_gpu_dpcpp/lib:/opt/intel/oneapi/debugger/2023.0.0/gdb/intel64/lib:/opt/intel/oneapi/debugger/2023.0.0/libipt/intel64/lib:/opt/intel/oneapi/debugger/2023.0.0/dep/lib:/opt/intel/oneapi/dal/2023.0.0/lib/intel64:/opt/intel/oneapi/compiler/2023.0.0/linux/lib:/opt/intel/oneapi/compiler/2023.0.0/linux/lib/x64:/opt/intel/oneapi/compiler/2023.0.0/linux/lib/oclfpga/host/linux64/lib:/opt/intel/oneapi/compiler/2023.0.0/linux/compiler/lib/intel64_lin:/opt/intel/oneapi/ccl/2021.8.0/lib/cpu_gpu_dpcpp
2023-03-31 10:24:10.108115: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-03-31 10:24:10.459519: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/intel/oneapi/vpl/2023.0.0/lib:/opt/intel/oneapi/tbb/2021.8.0/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/mpi/2021.8.0//libfabric/lib:/opt/intel/oneapi/mpi/2021.8.0//lib/release:/opt/intel/oneapi/mpi/2021.8.0//lib:/opt/intel/oneapi/mkl/2023.0.0/lib/intel64:/opt/intel/oneapi/ippcp/2021.6.3/lib/intel64:/opt/intel/oneapi/ipp/2021.7.0/lib/intel64:/opt/intel/oneapi/dnnl/2023.0.0/cpu_dpcpp_gpu_dpcpp/lib:/opt/intel/oneapi/debugger/2023.0.0/gdb/intel64/lib:/opt/intel/oneapi/debugger/2023.0.0/libipt/intel64/lib:/opt/intel/oneapi/debugger/2023.0.0/dep/lib:/opt/intel/oneapi/dal/2023.0.0/lib/intel64:/opt/intel/oneapi/compiler/2023.0.0/linux/lib:/opt/intel/oneapi/compiler/2023.0.0/linux/lib/x64:/opt/intel/oneapi/compiler/2023.0.0/linux/lib/oclfpga/host/linux64/lib:/opt/intel/oneapi/compiler/2023.0.0/linux/compiler/lib/intel64_lin:/opt/intel/oneapi/ccl/2021.8.0/lib/cpu_gpu_dpcpp
2023-03-31 10:24:10.459613: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/intel/oneapi/vpl/2023.0.0/lib:/opt/intel/oneapi/tbb/2021.8.0/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/mpi/2021.8.0//libfabric/lib:/opt/intel/oneapi/mpi/2021.8.0//lib/release:/opt/intel/oneapi/mpi/2021.8.0//lib:/opt/intel/oneapi/mkl/2023.0.0/lib/intel64:/opt/intel/oneapi/ippcp/2021.6.3/lib/intel64:/opt/intel/oneapi/ipp/2021.7.0/lib/intel64:/opt/intel/oneapi/dnnl/2023.0.0/cpu_dpcpp_gpu_dpcpp/lib:/opt/intel/oneapi/debugger/2023.0.0/gdb/intel64/lib:/opt/intel/oneapi/debugger/2023.0.0/libipt/intel64/lib:/opt/intel/oneapi/debugger/2023.0.0/dep/lib:/opt/intel/oneapi/dal/2023.0.0/lib/intel64:/opt/intel/oneapi/compiler/2023.0.0/linux/lib:/opt/intel/oneapi/compiler/2023.0.0/linux/lib/x64:/opt/intel/oneapi/compiler/2023.0.0/linux/lib/oclfpga/host/linux64/lib:/opt/intel/oneapi/compiler/2023.0.0/linux/compiler/lib/intel64_lin:/opt/intel/oneapi/ccl/2021.8.0/lib/cpu_gpu_dpcpp
2023-03-31 10:24:10.459635: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-03-31 10:24:11.112980: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/intel/oneapi/vpl/2023.0.0/lib:/opt/intel/oneapi/tbb/2021.8.0/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/mpi/2021.8.0//libfabric/lib:/opt/intel/oneapi/mpi/2021.8.0//lib/release:/opt/intel/oneapi/mpi/2021.8.0//lib:/opt/intel/oneapi/mkl/2023.0.0/lib/intel64:/opt/intel/oneapi/ippcp/2021.6.3/lib/intel64:/opt/intel/oneapi/ipp/2021.7.0/lib/intel64:/opt/intel/oneapi/dnnl/2023.0.0/cpu_dpcpp_gpu_dpcpp/lib:/opt/intel/oneapi/debugger/2023.0.0/gdb/intel64/lib:/opt/intel/oneapi/debugger/2023.0.0/libipt/intel64/lib:/opt/intel/oneapi/debugger/2023.0.0/dep/lib:/opt/intel/oneapi/dal/2023.0.0/lib/intel64:/opt/intel/oneapi/compiler/2023.0.0/linux/lib:/opt/intel/oneapi/compiler/2023.0.0/linux/lib/x64:/opt/intel/oneapi/compiler/2023.0.0/linux/lib/oclfpga/host/linux64/lib:/opt/intel/oneapi/compiler/2023.0.0/linux/compiler/lib/intel64_lin:/opt/intel/oneapi/ccl/2021.8.0/lib/cpu_gpu_dpcpp
2023-03-31 10:24:11.113026: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
2023-03-31 10:24:11.113041: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (DESKTOP-TEOGi): /proc/driver/nvidia/version does not exist
2023-03-31 10:24:11.235196: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-31 10:24:11.236647: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform XPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-03-31 10:24:11.236698: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:XPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: XPU, pci bus id: <undefined>)
Segmentation fault
Get a Segmentation fault without any verbose output. Seem like there is XPU available, but not well recognized.
Another try on tensorflow with tf.config.list_physical_devices()
, the output might help
python -c "import tensorflow as tf; tf.config.list_physical_devices()"
2023-03-31 10:38:39.811398: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-31 10:38:39.886990: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-03-31 10:38:39.889202: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/intel/oneapi/vpl/2023.0.0/lib:/opt/intel/oneapi/tbb/2021.8.0/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/mpi/2021.8.0//libfabric/lib:/opt/intel/oneapi/mpi/2021.8.0//lib/release:/opt/intel/oneapi/mpi/2021.8.0//lib:/opt/intel/oneapi/mkl/2023.0.0/lib/intel64:/opt/intel/oneapi/ippcp/2021.6.3/lib/intel64:/opt/intel/oneapi/ipp/2021.7.0/lib/intel64:/opt/intel/oneapi/dnnl/2023.0.0/cpu_dpcpp_gpu_dpcpp/lib:/opt/intel/oneapi/debugger/2023.0.0/gdb/intel64/lib:/opt/intel/oneapi/debugger/2023.0.0/libipt/intel64/lib:/opt/intel/oneapi/debugger/2023.0.0/dep/lib:/opt/intel/oneapi/dal/2023.0.0/lib/intel64:/opt/intel/oneapi/compiler/2023.0.0/linux/lib:/opt/intel/oneapi/compiler/2023.0.0/linux/lib/x64:/opt/intel/oneapi/compiler/2023.0.0/linux/lib/oclfpga/host/linux64/lib:/opt/intel/oneapi/compiler/2023.0.0/linux/compiler/lib/intel64_lin:/opt/intel/oneapi/ccl/2021.8.0/lib/cpu_gpu_dpcpp
2023-03-31 10:38:39.889239: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-03-31 10:38:40.297306: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/intel/oneapi/vpl/2023.0.0/lib:/opt/intel/oneapi/tbb/2021.8.0/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/mpi/2021.8.0//libfabric/lib:/opt/intel/oneapi/mpi/2021.8.0//lib/release:/opt/intel/oneapi/mpi/2021.8.0//lib:/opt/intel/oneapi/mkl/2023.0.0/lib/intel64:/opt/intel/oneapi/ippcp/2021.6.3/lib/intel64:/opt/intel/oneapi/ipp/2021.7.0/lib/intel64:/opt/intel/oneapi/dnnl/2023.0.0/cpu_dpcpp_gpu_dpcpp/lib:/opt/intel/oneapi/debugger/2023.0.0/gdb/intel64/lib:/opt/intel/oneapi/debugger/2023.0.0/libipt/intel64/lib:/opt/intel/oneapi/debugger/2023.0.0/dep/lib:/opt/intel/oneapi/dal/2023.0.0/lib/intel64:/opt/intel/oneapi/compiler/2023.0.0/linux/lib:/opt/intel/oneapi/compiler/2023.0.0/linux/lib/x64:/opt/intel/oneapi/compiler/2023.0.0/linux/lib/oclfpga/host/linux64/lib:/opt/intel/oneapi/compiler/2023.0.0/linux/compiler/lib/intel64_lin:/opt/intel/oneapi/ccl/2021.8.0/lib/cpu_gpu_dpcpp
2023-03-31 10:38:40.297388: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/intel/oneapi/vpl/2023.0.0/lib:/opt/intel/oneapi/tbb/2021.8.0/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/mpi/2021.8.0//libfabric/lib:/opt/intel/oneapi/mpi/2021.8.0//lib/release:/opt/intel/oneapi/mpi/2021.8.0//lib:/opt/intel/oneapi/mkl/2023.0.0/lib/intel64:/opt/intel/oneapi/ippcp/2021.6.3/lib/intel64:/opt/intel/oneapi/ipp/2021.7.0/lib/intel64:/opt/intel/oneapi/dnnl/2023.0.0/cpu_dpcpp_gpu_dpcpp/lib:/opt/intel/oneapi/debugger/2023.0.0/gdb/intel64/lib:/opt/intel/oneapi/debugger/2023.0.0/libipt/intel64/lib:/opt/intel/oneapi/debugger/2023.0.0/dep/lib:/opt/intel/oneapi/dal/2023.0.0/lib/intel64:/opt/intel/oneapi/compiler/2023.0.0/linux/lib:/opt/intel/oneapi/compiler/2023.0.0/linux/lib/x64:/opt/intel/oneapi/compiler/2023.0.0/linux/lib/oclfpga/host/linux64/lib:/opt/intel/oneapi/compiler/2023.0.0/linux/compiler/lib/intel64_lin:/opt/intel/oneapi/ccl/2021.8.0/lib/cpu_gpu_dpcpp
2023-03-31 10:38:40.297410: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-03-31 10:38:40.996077: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/intel/oneapi/vpl/2023.0.0/lib:/opt/intel/oneapi/tbb/2021.8.0/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/mpi/2021.8.0//libfabric/lib:/opt/intel/oneapi/mpi/2021.8.0//lib/release:/opt/intel/oneapi/mpi/2021.8.0//lib:/opt/intel/oneapi/mkl/2023.0.0/lib/intel64:/opt/intel/oneapi/ippcp/2021.6.3/lib/intel64:/opt/intel/oneapi/ipp/2021.7.0/lib/intel64:/opt/intel/oneapi/dnnl/2023.0.0/cpu_dpcpp_gpu_dpcpp/lib:/opt/intel/oneapi/debugger/2023.0.0/gdb/intel64/lib:/opt/intel/oneapi/debugger/2023.0.0/libipt/intel64/lib:/opt/intel/oneapi/debugger/2023.0.0/dep/lib:/opt/intel/oneapi/dal/2023.0.0/lib/intel64:/opt/intel/oneapi/compiler/2023.0.0/linux/lib:/opt/intel/oneapi/compiler/2023.0.0/linux/lib/x64:/opt/intel/oneapi/compiler/2023.0.0/linux/lib/oclfpga/host/linux64/lib:/opt/intel/oneapi/compiler/2023.0.0/linux/compiler/lib/intel64_lin:/opt/intel/oneapi/ccl/2021.8.0/lib/cpu_gpu_dpcpp
2023-03-31 10:38:40.996121: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
2023-03-31 10:38:40.996135: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (DESKTOP-TEOGi): /proc/driver/nvidia/version does not exist
No GPU devices were recognized.
@teogi Could you run following cmd to check and share the log to us:
(tf)$ bash /path to site-packages/intel_extension_for_tensorflow/tools/env_check.sh
bash intel-extension-for-tensorflow/tools/env_check.sh
Check Environment for Intel(R) Extension for TensorFlow*...
========================== Check OS ==========================
OS ubuntu:22.04 is Supported.
====================== Check OS Passed =======================
=================== Check Intel GPU Driver ===================
Intel(R) graphics runtime intel-level-zero-gpu-1.3.24595.35+i538~22.04 is installed.
Intel(R) graphics runtime intel-opencl-icd-22.43.24595.35+i538~22.04 is installed.
Intel(R) graphics runtime level-zero-1.8.8+i524~u22.04 is installed.
Intel(R) graphics runtime level-zero-dev-1.8.8+i524~u22.04 is installed.
Intel(R) graphics runtime libdrm-common-2.4.113-2~ubuntu0.22.04.1 is installed.
Intel(R) graphics runtime libdrm2-2.4.113-2~ubuntu0.22.04.1 is installed.
Intel(R) graphics runtime libdrm-amdgpu1-2.4.113-2~ubuntu0.22.04.1 is installed.
Intel(R) graphics runtime libdrm-intel1-2.4.113-2~ubuntu0.22.04.1 is installed.
Intel(R) graphics runtime libdrm-nouveau2-2.4.113-2~ubuntu0.22.04.1 is installed.
Intel(R) graphics runtime libdrm-dev-2.4.113-2~ubuntu0.22.04.1 is installed.
Intel(R) graphics runtime libigc1-1.0.12812.24+i554~22.04 is installed.
Intel(R) graphics runtime libigdfcl1-1.0.12812.24+i554~22.04 is installed.
Intel(R) graphics runtime libigdgmm12-22.3.3+i550~22.04 is installed.
=============== Check Intel GPU Driver Finshed ================
======================== Check Python ========================
python3.9 is installed.
==================== Check Python Passed =====================
====================== Check Tensorflow ======================
tensorflow2.11 is installed.
================== Check Tensorflow Passed ===================
===================== Check Intel OneApi =====================
Intel(R) OneAPI DPC++/C++ Compiler is installed.
Intel(R) OneAPI Math Kernel Library is installed.
================= Check Intel OneApi Passed ==================
seem like nothing is wrong with the log.
some other information that I have seen in the tutorial might help is given below:
for vainfo
:
vainfo
Trying display: wayland
libva info: VA-API version 1.17.0
libva error: vaGetDriverNameByIndex() failed with invalid VADisplay, driver_name = (null)
vaInitialize failed with error code 3 (invalid VADisplay),exit
glxinfo -B
gives:
glxinfo -B
name of display: :0
display: :0 screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
Vendor: Mesa/X.org (0xffffffff)
Device: llvmpipe (LLVM 13.0.1, 256 bits) (0xffffffff)
Version: 23.0.0
Accelerated: no
Video memory: 7765MB
Unified memory: yes
Preferred profile: core (0x1)
Max core profile version: 4.5
Max compat profile version: 4.5
Max GLES1 profile version: 1.1
Max GLES[23] profile version: 3.2
OpenGL vendor string: Mesa/X.org
OpenGL renderer string: llvmpipe (LLVM 13.0.1, 256 bits)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 23.0.0-devel (git-4b077ffb98)
OpenGL core profile shading language version string: 4.50
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL version string: 4.5 (Compatibility Profile) Mesa 23.0.0-devel (git-4b077ffb98)
OpenGL shading language version string: 4.50
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile
OpenGL ES profile version string: OpenGL ES 3.2 Mesa 23.0.0-devel (git-4b077ffb98)
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
@teogi The env_check.sh looks OK, the driver are installed. 'vainfo' shows the video "vaapi" interfaces, it's NOT OK. But tensorflow does not depends on vaapi. So it's another issue.
ITEX depends on sycl/opencl interfaces.
Could you check the output of clinfo
?
What's your enviroment, the OS, tensorflow and ITEX version?
Thank you for the documentation. I've been facing the same issue as OP. Both, under WSL2 and native Ubuntu 22.04.
I hope the following info could help
Graphic card Intel Arc a770 LE.
Output of env_check.sh:
========================== Check OS ==========================
OS ubuntu:22.04 is Supported.
====================== Check OS Passed =======================
=================== Check Intel GPU Driver ===================
Intel(R) graphics runtime intel-level-zero-gpu-1.3.25018.23+i554~22.04 is installed.
Intel(R) graphics runtime intel-opencl-icd-22.49.25018.23+i554~22.04 is installed.
Intel(R) graphics runtime level-zero-1.8.8+i524~u22.04 is installed.
Intel(R) graphics runtime level-zero-dev-1.8.8+i524~u22.04 is installed.
Intel(R) graphics runtime libdrm-common-2.4.113-2~ubuntu0.22.04.1 is installed.
Intel(R) graphics runtime libdrm2-2.4.113-2~ubuntu0.22.04.1 is installed.
Intel(R) graphics runtime libdrm-amdgpu1-2.4.113-2~ubuntu0.22.04.1 is installed.
Intel(R) graphics runtime libdrm-intel1-2.4.113-2~ubuntu0.22.04.1 is installed.
Intel(R) graphics runtime libdrm-nouveau2-2.4.113-2~ubuntu0.22.04.1 is installed.
Intel(R) graphics runtime libdrm-dev-2.4.113-2~ubuntu0.22.04.1 is installed.
Intel(R) graphics runtime libigc1-1.0.12812.24+i554~22.04 is installed.
Intel(R) graphics runtime libigdfcl1-1.0.12812.24+i554~22.04 is installed.
Intel(R) graphics runtime libigdgmm12-22.3.3+i550~22.04 is installed.
=============== Check Intel GPU Driver Finshed ================
======================== Check Python ========================
python3.10 is installed.
==================== Check Python Passed =====================
====================== Check Tensorflow ======================
tensorflow2.11 is installed.
================== Check Tensorflow Passed ===================
===================== Check Intel OneApi =====================
Intel(R) OneAPI DPC++/C++ Compiler is installed.
Intel(R) OneAPI Math Kernel Library is installed.
================= Check Intel OneApi Passed ==================
output of python -c "import intel_extension_for_tensorflow as itex; print(itex.version)" show the device:
2023-04-05 23:29:02.317034: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-05 23:29:02.386063: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-04-05 23:29:02.386080: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-04-05 23:29:02.764750: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-04-05 23:29:02.764787: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2023-04-05 23:29:02.764794: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-04-05 23:29:03.283712: I itex/core/devices/gpu/itex_gpu_runtime.cc:129] Selected platform: Intel(R) Level-Zero
2023-04-05 23:29:03.283733: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
2023-04-05 23:29:03.283839: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2023-04-05 23:29:03.283851: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
2023-04-05 23:29:03.283862: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (ubuntu-desktop): /proc/driver/nvidia/version does not exist
1.1.0
Nevertheless when loading Tensorflow from a notebook shows "Can not found any devices":
2023-04-05 23:36:12.095530: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-05 23:36:12.169291: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-04-05 23:36:12.169301: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-04-05 23:36:12.558124: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-04-05 23:36:12.558163: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2023-04-05 23:36:12.558167: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-04-05 23:36:13.167521: E itex/core/devices/gpu/itex_gpu_runtime.cc:173] Can not found any devices. To check runtime environment on your host, please run itex/itex/tools/env_check.sh.
If you need help, create an issue at https://github.com/intel/intel-extension-for-tensorflow/issues
2023-04-05 23:36:13.168042: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2023-04-05 23:36:13.168049: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
2023-04-05 23:36:13.168060: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (ubuntu-desktop): /proc/driver/nvidia/version does not exist
clinfo output:
Number of platforms 4
Platform Name Intel(R) OpenCL
Platform Vendor Intel(R) Corporation
Platform Version OpenCL 3.0 LINUX
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_spirv_linkonce_odr cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_khr_il_program cl_intel_unified_shared_memory_preview cl_intel_device_attribute_query cl_intel_subgroups cl_intel_subgroups_char cl_intel_subgroups_short cl_intel_subgroups_long cl_intel_spirv_subgroups cl_intel_required_subgroup_size cl_intel_exec_by_local_thread cl_intel_vec_len_hint cl_intel_device_partition_by_names cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer
Platform Extensions with Version cl_khr_spirv_linkonce_odr 0x400000 (1.0.0)
cl_khr_icd 0x400000 (1.0.0)
cl_khr_global_int32_base_atomics 0x400000 (1.0.0)
cl_khr_global_int32_extended_atomics 0x400000 (1.0.0)
cl_khr_local_int32_base_atomics 0x400000 (1.0.0)
cl_khr_local_int32_extended_atomics 0x400000 (1.0.0)
cl_khr_int64_base_atomics 0x400000 (1.0.0)
cl_khr_int64_extended_atomics 0x400000 (1.0.0)
cl_khr_byte_addressable_store 0x400000 (1.0.0)
cl_khr_depth_images 0x400000 (1.0.0)
cl_khr_3d_image_writes 0x400000 (1.0.0)
cl_khr_il_program 0x400000 (1.0.0)
cl_intel_unified_shared_memory_preview 0x400000 (1.0.0)
cl_intel_device_attribute_query 0x400000 (1.0.0)
cl_intel_subgroups 0x400000 (1.0.0)
cl_intel_subgroups_char 0x400000 (1.0.0)
cl_intel_subgroups_short 0x400000 (1.0.0)
cl_intel_subgroups_long 0x400000 (1.0.0)
cl_intel_spirv_subgroups 0x400000 (1.0.0)
cl_intel_required_subgroup_size 0x400000 (1.0.0)
cl_intel_exec_by_local_thread 0x400000 (1.0.0)
cl_intel_vec_len_hint 0x400000 (1.0.0)
cl_intel_device_partition_by_names 0x400000 (1.0.0)
cl_khr_spir 0x400000 (1.0.0)
cl_khr_fp64 0x400000 (1.0.0)
cl_khr_image2d_from_buffer 0x400000 (1.0.0)
Platform Numeric Version 0xc00000 (3.0.0)
Platform Extensions function suffix INTEL
Platform Host timer resolution 1ns
Platform Name Intel(R) FPGA Emulation Platform for OpenCL(TM)
Platform Vendor Intel(R) Corporation
Platform Version OpenCL 1.2 Intel(R) FPGA SDK for OpenCL(TM), Version 20.3
Platform Profile EMBEDDED_PROFILE
Platform Extensions cl_khr_spirv_linkonce_odr cl_khr_fp64 cl_khr_icd cl_khr_byte_addressable_store cl_intel_fpga_host_pipe cles_khr_int64 cl_khr_il_program cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
Platform Extensions function suffix IntelFPGA
Platform Name Intel(R) OpenCL HD Graphics
Platform Vendor Intel(R) Corporation
Platform Version OpenCL 3.0
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_byte_addressable_store cl_khr_device_uuid cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_suggested_local_work_size cl_intel_split_work_group_barrier cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_bfloat16_conversions cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info cl_intel_create_buffer_with_properties cl_intel_dot_accumulate cl_intel_subgroup_local_block_io cl_intel_subgroup_matrix_multiply_accumulate cl_intel_subgroup_split_matrix_multiply_accumulate
Platform Extensions with Version cl_khr_byte_addressable_store 0x400000 (1.0.0)
cl_khr_device_uuid 0x400000 (1.0.0)
cl_khr_fp16 0x400000 (1.0.0)
cl_khr_global_int32_base_atomics 0x400000 (1.0.0)
cl_khr_global_int32_extended_atomics 0x400000 (1.0.0)
cl_khr_icd 0x400000 (1.0.0)
cl_khr_local_int32_base_atomics 0x400000 (1.0.0)
cl_khr_local_int32_extended_atomics 0x400000 (1.0.0)
cl_intel_command_queue_families 0x400000 (1.0.0)
cl_intel_subgroups 0x400000 (1.0.0)
cl_intel_required_subgroup_size 0x400000 (1.0.0)
cl_intel_subgroups_short 0x400000 (1.0.0)
cl_khr_spir 0x400000 (1.0.0)
cl_intel_accelerator 0x400000 (1.0.0)
cl_intel_driver_diagnostics 0x400000 (1.0.0)
cl_khr_priority_hints 0x400000 (1.0.0)
cl_khr_throttle_hints 0x400000 (1.0.0)
cl_khr_create_command_queue 0x400000 (1.0.0)
cl_intel_subgroups_char 0x400000 (1.0.0)
cl_intel_subgroups_long 0x400000 (1.0.0)
cl_khr_il_program 0x400000 (1.0.0)
cl_intel_mem_force_host_memory 0x400000 (1.0.0)
cl_khr_subgroup_extended_types 0x400000 (1.0.0)
cl_khr_subgroup_non_uniform_vote 0x400000 (1.0.0)
cl_khr_subgroup_ballot 0x400000 (1.0.0)
cl_khr_subgroup_non_uniform_arithmetic 0x400000 (1.0.0)
cl_khr_subgroup_shuffle 0x400000 (1.0.0)
cl_khr_subgroup_shuffle_relative 0x400000 (1.0.0)
cl_khr_subgroup_clustered_reduce 0x400000 (1.0.0)
cl_intel_device_attribute_query 0x400000 (1.0.0)
cl_khr_suggested_local_work_size 0x400000 (1.0.0)
cl_intel_split_work_group_barrier 0x400000 (1.0.0)
cl_intel_spirv_media_block_io 0x400000 (1.0.0)
cl_intel_spirv_subgroups 0x400000 (1.0.0)
cl_khr_spirv_no_integer_wrap_decoration 0x400000 (1.0.0)
cl_intel_unified_shared_memory 0x400000 (1.0.0)
cl_khr_mipmap_image 0x400000 (1.0.0)
cl_khr_mipmap_image_writes 0x400000 (1.0.0)
cl_intel_planar_yuv 0x400000 (1.0.0)
cl_intel_packed_yuv 0x400000 (1.0.0)
cl_khr_int64_base_atomics 0x400000 (1.0.0)
cl_khr_int64_extended_atomics 0x400000 (1.0.0)
cl_khr_image2d_from_buffer 0x400000 (1.0.0)
cl_khr_depth_images 0x400000 (1.0.0)
cl_khr_3d_image_writes 0x400000 (1.0.0)
cl_intel_media_block_io 0x400000 (1.0.0)
cl_intel_bfloat16_conversions 0x400000 (1.0.0)
cl_intel_va_api_media_sharing 0x400000 (1.0.0)
cl_intel_sharing_format_query 0x400000 (1.0.0)
cl_khr_pci_bus_info 0x400000 (1.0.0)
cl_intel_create_buffer_with_properties 0x400000 (1.0.0)
cl_intel_dot_accumulate 0x400000 (1.0.0)
cl_intel_subgroup_local_block_io 0x400000 (1.0.0)
cl_intel_subgroup_matrix_multiply_accumulate 0x400000 (1.0.0)
cl_intel_subgroup_split_matrix_multiply_accumulate 0x400000 (1.0.0)
Platform Numeric Version 0xc00000 (3.0.0)
Platform Extensions function suffix INTEL
Platform Host timer resolution 1ns
Platform Name Intel(R) FPGA Emulation Platform for OpenCL(TM)
Platform Vendor Intel(R) Corporation
Platform Version OpenCL 1.2 Intel(R) FPGA SDK for OpenCL(TM), Version 20.3
Platform Profile EMBEDDED_PROFILE
Platform Extensions cl_khr_spirv_linkonce_odr cl_khr_icd cl_khr_byte_addressable_store cl_intel_fpga_host_pipe cles_khr_int64 cl_khr_il_program cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
Platform Extensions function suffix IntelFPGA
Platform Name Intel(R) OpenCL
Number of devices 1
Device Name AMD Ryzen 5 5600 6-Core Processor
Device Vendor Intel(R) Corporation
Device Vendor ID 0x8086
Device Version OpenCL 3.0 (Build 0)
Device Numeric Version 0xc00000 (3.0.0)
Driver Version 2022.14.7.0.30_160000
Device OpenCL C Version OpenCL C 3.0
Device OpenCL C all versions OpenCL C 0xc00000 (3.0.0)
OpenCL C 0x800000 (2.0.0)
OpenCL C 0x402000 (1.2.0)
OpenCL C 0x401000 (1.1.0)
OpenCL C 0x400000 (1.0.0)
Device OpenCL C features __opencl_c_3d_image_writes 0xc00000 (3.0.0)
__opencl_c_atomic_order_acq_rel 0xc00000 (3.0.0)
__opencl_c_atomic_order_seq_cst 0xc00000 (3.0.0)
__opencl_c_atomic_scope_device 0xc00000 (3.0.0)
__opencl_c_atomic_scope_all_devices 0xc00000 (3.0.0)
__opencl_c_device_enqueue 0xc00000 (3.0.0)
__opencl_c_generic_address_space 0xc00000 (3.0.0)
__opencl_c_fp64 0xc00000 (3.0.0)
__opencl_c_images 0xc00000 (3.0.0)
__opencl_c_int64 0xc00000 (3.0.0)
__opencl_c_pipes 0xc00000 (3.0.0)
__opencl_c_program_scope_global_variables 0xc00000 (3.0.0)
__opencl_c_read_write_images 0xc00000 (3.0.0)
__opencl_c_subgroups 0xc00000 (3.0.0)
__opencl_c_work_group_collective_functions 0xc00000 (3.0.0)
Latest comfornace test passed v2021-08-16-00
Device Type CPU
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 12
Max clock frequency 0MHz
Device Partition (core)
Max number of sub-devices 12
Supported partition types by counts, equally, by names (Intel)
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 8192x8192x8192
Max work group size 8192
Preferred work group size multiple (device) 128
Preferred work group size multiple (kernel) 128
Max sub-groups per work group 2048
Sub-group sizes (Intel) 4, 8, 16, 32, 64
Preferred / native vector sizes
char 1 / 32
short 1 / 16
int 1 / 8
long 1 / 4
half 0 / 0 (n/a)
float 1 / 8
double 1 / 4 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Address bits 64, Little-Endian
Global memory size 16694919168 (15.55GiB)
Error Correction support No
Max memory allocation 8347459584 (7.774GiB)
Unified memory for Host and Device Yes
Shared Virtual Memory (SVM) capabilities (core)
Coarse-grained buffer sharing Yes
Fine-grained buffer sharing Yes
Fine-grained system sharing Yes
Atomics Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Preferred alignment for atomics
SVM 64 bytes
Global 64 bytes
Local 0 bytes
Atomic memory capabilities relaxed, acquire/release, sequentially-consistent, work-group scope, device scope, all-devices scope
Atomic fence capabilities relaxed, acquire/release, sequentially-consistent, work-item scope, work-group scope, device scope, all-devices scope
Max size for global variable 65536 (64KiB)
Preferred total size of global vars 65536 (64KiB)
Global Memory cache type Read/Write
Global Memory cache size 524288 (512KiB)
Global Memory cache line size 64 bytes
Image support Yes
Max number of samplers per kernel 480
Max size for 1D images from buffer 521716224 pixels
Max 1D or 2D image array size 2048 images
Base address alignment for 2D image buffers 64 bytes
Pitch alignment for 2D image buffers 64 pixels
Max 2D image size 16384x16384 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 480
Max number of write image args 480
Max number of read/write image args 480
Pipe support Yes
Max number of pipe args 16
Max active pipe reservations 21845
Max pipe packet size 1024
Local memory type Global
Local memory size 32768 (32KiB)
Max number of constant args 480
Max constant buffer size 131072 (128KiB)
Generic address space support Yes
Max size of kernel argument 3840 (3.75KiB)
Queue properties (on host)
Out-of-order execution Yes
Profiling Yes
Local thread execution (Intel) Yes
Device enqueue capabilities supported, replaceable default queue
Queue properties (on device)
Out-of-order execution Yes
Profiling Yes
Preferred size 4294967295 (4GiB)
Max size 4294967295 (4GiB)
Max queues on device 4294967295
Max events on device 4294967295
Prefer user sync for interop No
Profiling timer resolution 1ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels Yes
Non-uniform work-groups Yes
Work-group collective functions Yes
Sub-group independent forward progress No
IL version SPIR-V_1.0
ILs with version SPIR-V 0x400000 (1.0.0)
SPIR versions 1.2
printf() buffer size 1048576 (1024KiB)
Built-in kernels (n/a)
Built-in kernels with version (n/a)
Device Extensions cl_khr_spirv_linkonce_odr cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_khr_il_program cl_intel_unified_shared_memory_preview cl_intel_device_attribute_query cl_intel_subgroups cl_intel_subgroups_char cl_intel_subgroups_short cl_intel_subgroups_long cl_intel_spirv_subgroups cl_intel_required_subgroup_size cl_intel_exec_by_local_thread cl_intel_vec_len_hint cl_intel_device_partition_by_names cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer
Device Extensions with Version cl_khr_spirv_linkonce_odr 0x400000 (1.0.0)
cl_khr_icd 0x400000 (1.0.0)
cl_khr_global_int32_base_atomics 0x400000 (1.0.0)
cl_khr_global_int32_extended_atomics 0x400000 (1.0.0)
cl_khr_local_int32_base_atomics 0x400000 (1.0.0)
cl_khr_local_int32_extended_atomics 0x400000 (1.0.0)
cl_khr_int64_base_atomics 0x400000 (1.0.0)
cl_khr_int64_extended_atomics 0x400000 (1.0.0)
cl_khr_byte_addressable_store 0x400000 (1.0.0)
cl_khr_depth_images 0x400000 (1.0.0)
cl_khr_3d_image_writes 0x400000 (1.0.0)
cl_khr_il_program 0x400000 (1.0.0)
cl_intel_unified_shared_memory_preview 0x400000 (1.0.0)
cl_intel_device_attribute_query 0x400000 (1.0.0)
cl_intel_subgroups 0x400000 (1.0.0)
cl_intel_subgroups_char 0x400000 (1.0.0)
cl_intel_subgroups_short 0x400000 (1.0.0)
cl_intel_subgroups_long 0x400000 (1.0.0)
cl_intel_spirv_subgroups 0x400000 (1.0.0)
cl_intel_required_subgroup_size 0x400000 (1.0.0)
cl_intel_exec_by_local_thread 0x400000 (1.0.0)
cl_intel_vec_len_hint 0x400000 (1.0.0)
cl_intel_device_partition_by_names 0x400000 (1.0.0)
cl_khr_spir 0x400000 (1.0.0)
cl_khr_fp64 0x400000 (1.0.0)
cl_khr_image2d_from_buffer 0x400000 (1.0.0)
Platform Name Intel(R) FPGA Emulation Platform for OpenCL(TM)
Number of devices 1
Device Name Intel(R) FPGA Emulation Device
Device Vendor Intel(R) Corporation
Device Vendor ID 0x1172
Device Version OpenCL 1.2
Driver Version 2023.15.3.0.20_160000
Device OpenCL C Version OpenCL C 1.2
Device Type Accelerator
Device Profile EMBEDDED_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 12
Max clock frequency 0MHz
Device Partition (core)
Max number of sub-devices 12
Supported partition types by counts, equally, by names (Intel)
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 67108864x67108864x67108864
Max work group size 67108864
Segmentation fault (core dumped)
@teogi from you log, that only import ITEX can find GPU
2023-04-05 23:29:03.283712: I itex/core/devices/gpu/itex_gpu_runtime.cc:129] Selected platform: Intel(R) Level-Zero
But your real workload cannot.
Suspect your environment has some lib conflict. Can you run your workload with:
OCL_ICD_ENABLE_TRACE=1 OCL_ICD_DEBUG=2
and share the log? thanks.
Sorry for the ignorance,
how can I set OCL_ICD_ENABLE_TRACE=1 OCL_ICD_DEBUG=2
?
I notice that the problem happens when I use jupyter notebook or jupyter lab. If I run a python script from the command line using the same environment it shows the GPU
2023-04-06 08:21:51.841138: I itex/core/devices/gpu/itex_gpu_runtime.cc:129] Selected platform: Intel(R) Level-Zero
2023-04-06 08:21:51.841160: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
And if I use source /opt/intel/oneapi/setvars.sh
and run the Quick example as a python script, works fine but if I run the same code in a jupyter notebook it throws:
NotFoundError Traceback (most recent call last)
Cell In[1], line 1
----> 1 from tensorflow import keras
2 from tensorflow.keras import layers
3 from tensorflow.keras.models import load_model
File ~/tensorflow/lib/python3.10/site-packages/tensorflow/__init__.py:440
438 _plugin_dir = _os.path.join(_s, 'tensorflow-plugins')
439 if _os.path.exists(_plugin_dir):
--> 440 _ll.load_library(_plugin_dir)
441 # Load Pluggable Device Library
442 _ll.load_pluggable_device_library(_plugin_dir)
File ~/tensorflow/lib/python3.10/site-packages/tensorflow/python/framework/load_library.py:151, in load_library(library_location)
148 kernel_libraries = [library_location]
150 for lib in kernel_libraries:
--> 151 py_tf.TF_LoadLibrary(lib)
153 else:
154 raise OSError(
155 errno.ENOENT,
156 'The file or folder to load kernel libraries from does not exist.',
157 library_location)
NotFoundError: /opt/intel/oneapi/lib/intel64/libmkl_sycl.so.3: undefined symbol: viRngGeometric_64
If I don't enable the oneAPI componets through source /opt/intel/oneapi/setvars.sh
and run the script, again it runs fine and shows the device but under a notebook shows: E itex/core/devices/gpu/itex_gpu_runtime.cc:173] Can not found any devices. To check runtime environment on your host, please run itex/itex/tools/env_check.sh.
best regards
@mikemayuare
I think the jupyter notebook should be set to correct kernel
which is installed with ITEX.
In python cli, you run the script in correct python running envirionment.
but in jupyter notebook, you run the script in default kernel which is different running environment of python cli.
@NeoZhangJianyu Yes, you were right, I was using the default kernel.
I set the kernel from the environment with itex (tf) with python -m ipykernel install --user --name tf --display-name "TensorFlow"
The problem still persist. I checked the json file to see if the kernel is pointing to the right path anaconda3/envs/bin/tf/python
in my case. Both, under WSL2 and Ubuntu. I tried VSCode too because it let me directly choose the kernel but id did not work.
Best regards
@mikemayuare
before run ITEX, we need to enable oneAPI by source /opt/intel/oneapi/setvars.sh
.
In jupyter notebook case, I suggest enable oneAPI before startup jupyter notebook service.
Your error: NotFoundError: /opt/intel/oneapi/lib/intel64/libmkl_sycl.so.3: undefined symbol: viRngGeometric_64
Looks like the version oneMKL (included in oneAPI base toolkit) is wrong.
Could your run itex/itex/tools/env_check.sh
and share the log?
Thank you!
@teogi The env_check.sh looks OK, the driver are installed. 'vainfo' shows the video "vaapi" interfaces, it's NOT OK. But tensorflow does not depends on vaapi. So it's another issue.
ITEX depends on sycl/opencl interfaces. Could you check the output of
clinfo
?
Sorry for the late reply, here is my clinfo
output:
Number of platforms 4
Platform Name Intel(R) OpenCL
Platform Vendor Intel(R) Corporation
Platform Version OpenCL 3.0 LINUX
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_spirv_linkonce_odr cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_khr_il_program cl_intel_unified_shared_memory_preview cl_intel_device_attribute_query cl_intel_subgroups cl_intel_subgroups_char cl_intel_subgroups_short cl_intel_subgroups_long cl_intel_spirv_subgroups cl_intel_required_subgroup_size cl_intel_exec_by_local_thread cl_intel_vec_len_hint cl_intel_device_partition_by_names cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer
Platform Extensions with Version cl_khr_spirv_linkonce_odr 0x400000 (1.0.0)
cl_khr_icd 0x400000 (1.0.0)
cl_khr_global_int32_base_atomics 0x400000 (1.0.0)
cl_khr_global_int32_extended_atomics 0x400000 (1.0.0)
cl_khr_local_int32_base_atomics 0x400000 (1.0.0)
cl_khr_local_int32_extended_atomics 0x400000 (1.0.0)
cl_khr_int64_base_atomics 0x400000 (1.0.0)
cl_khr_int64_extended_atomics 0x400000 (1.0.0)
cl_khr_byte_addressable_store 0x400000 (1.0.0)
cl_khr_depth_images 0x400000 (1.0.0)
cl_khr_3d_image_writes 0x400000 (1.0.0)
cl_khr_il_program 0x400000 (1.0.0)
cl_intel_unified_shared_memory_preview 0x400000 (1.0.0)
cl_intel_device_attribute_query
0x400000 (1.0.0)
cl_intel_subgroups
0x400000 (1.0.0)
cl_intel_subgroups_char
0x400000 (1.0.0)
cl_intel_subgroups_short
0x400000 (1.0.0)
cl_intel_subgroups_long
0x400000 (1.0.0)
cl_intel_spirv_subgroups
0x400000 (1.0.0)
cl_intel_required_subgroup_size
0x400000 (1.0.0)
cl_intel_exec_by_local_thread
0x400000 (1.0.0)
cl_intel_vec_len_hint
0x400000 (1.0.0)
cl_intel_device_partition_by_names 0x400000 (1.0.0)
cl_khr_spir
0x400000 (1.0.0)
cl_khr_fp64
0x400000 (1.0.0)
cl_khr_image2d_from_buffer
0x400000 (1.0.0)
Platform Numeric Version 0xc00000 (3.0.0)
Platform Extensions function suffix INTEL
Platform Host timer resolution 1ns
Platform Name Intel(R) FPGA Emulation Platform for OpenCL(TM)
Platform Vendor Intel(R) Corporation
Platform Version OpenCL 1.2 Intel(R) FPGA SDK for OpenCL(TM), Version 20.3
Platform Profile EMBEDDED_PROFILE
Platform Extensions cl_khr_spirv_linkonce_odr cl_khr_icd cl_khr_byte_addressable_store cl_intel_fpga_host_pipe cles_khr_int64 cl_khr_il_program cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
Platform Extensions function suffix IntelFPGA
Platform Name Intel(R) OpenCL HD Graphics
Platform Vendor Intel(R) Corporation
Platform Version OpenCL 3.0
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_byte_addressable_store cl_khr_device_uuid cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_suggested_local_work_size cl_intel_split_work_group_barrier cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_bfloat16_conversions cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info cl_intel_create_buffer_with_properties cl_intel_dot_accumulate cl_intel_subgroup_local_block_io cl_intel_subgroup_matrix_multiply_accumulate cl_intel_subgroup_split_matrix_multiply_accumulate
Platform Extensions with Version cl_khr_byte_addressable_store
0x400000 (1.0.0)
cl_khr_device_uuid
0x400000 (1.0.0)
cl_khr_fp16
0x400000 (1.0.0)
cl_khr_global_int32_base_atomics
0x400000 (1.0.0)
cl_khr_global_int32_extended_atomics 0x400000 (1.0.0)
cl_khr_icd
0x400000 (1.0.0)
cl_khr_local_int32_base_atomics
0x400000 (1.0.0)
cl_khr_local_int32_extended_atomics 0x400000 (1.0.0)
cl_intel_command_queue_families
0x400000 (1.0.0)
cl_intel_subgroups
0x400000 (1.0.0)
cl_intel_required_subgroup_size
0x400000 (1.0.0)
cl_intel_subgroups_short
0x400000 (1.0.0)
cl_khr_spir
0x400000 (1.0.0)
cl_intel_accelerator
0x400000 (1.0.0)
cl_intel_driver_diagnostics
0x400000 (1.0.0)
cl_khr_priority_hints
0x400000 (1.0.0)
cl_khr_throttle_hints
0x400000 (1.0.0)
cl_khr_create_command_queue
0x400000 (1.0.0)
cl_intel_subgroups_char
0x400000 (1.0.0)
cl_intel_subgroups_long
0x400000 (1.0.0)
cl_khr_il_program
0x400000 (1.0.0)
cl_intel_mem_force_host_memory
0x400000 (1.0.0)
cl_khr_subgroup_extended_types
0x400000 (1.0.0)
cl_khr_subgroup_non_uniform_vote
0x400000 (1.0.0)
cl_khr_subgroup_ballot
0x400000 (1.0.0)
cl_khr_subgroup_non_uniform_arithmetic 0x400000 (1.0.0)
cl_khr_subgroup_shuffle
0x400000 (1.0.0)
cl_khr_subgroup_shuffle_relative
0x400000 (1.0.0)
cl_khr_subgroup_clustered_reduce
0x400000 (1.0.0)
cl_intel_device_attribute_query
0x400000 (1.0.0)
cl_khr_suggested_local_work_size
0x400000 (1.0.0)
cl_intel_split_work_group_barrier
0x400000 (1.0.0)
cl_intel_spirv_media_block_io
0x400000 (1.0.0)
cl_intel_spirv_subgroups
0x400000 (1.0.0)
cl_khr_spirv_no_integer_wrap_decoration 0x400000 (1.0.0)
cl_intel_unified_shared_memory
0x400000 (1.0.0)
cl_khr_mipmap_image
0x400000 (1.0.0)
cl_khr_mipmap_image_writes
0x400000 (1.0.0)
cl_intel_planar_yuv
0x400000 (1.0.0)
cl_intel_packed_yuv
0x400000 (1.0.0)
cl_khr_int64_base_atomics
0x400000 (1.0.0)
cl_khr_int64_extended_atomics
0x400000 (1.0.0)
cl_khr_image2d_from_buffer
0x400000 (1.0.0)
cl_khr_depth_images
0x400000 (1.0.0)
cl_khr_3d_image_writes
0x400000 (1.0.0)
cl_intel_media_block_io
0x400000 (1.0.0)
cl_intel_bfloat16_conversions
0x400000 (1.0.0)
cl_intel_va_api_media_sharing
0x400000 (1.0.0)
cl_intel_sharing_format_query
0x400000 (1.0.0)
cl_khr_pci_bus_info
0x400000 (1.0.0)
cl_intel_create_buffer_with_properties 0x400000 (1.0.0)
cl_intel_dot_accumulate
0x400000 (1.0.0)
cl_intel_subgroup_local_block_io
0x400000 (1.0.0)
cl_intel_subgroup_matrix_multiply_accumulate 0x400000 (1.0.0)
cl_intel_subgroup_split_matrix_multiply_accumulate 0x400000 (1.0.0)
Platform Numeric Version 0xc00000 (3.0.0)
Platform Extensions function suffix INTEL
Platform Host timer resolution 1ns
Platform Name Intel(R) FPGA Emulation Platform for OpenCL(TM)
Platform Vendor Intel(R) Corporation
Platform Version OpenCL 1.2 Intel(R) FPGA SDK for OpenCL(TM), Version 20.3
Platform Profile EMBEDDED_PROFILE
Platform Extensions cl_khr_spirv_linkonce_odr cl_khr_icd cl_khr_byte_addressable_store cl_intel_fpga_host_pipe cles_khr_int64 cl_khr_il_program cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
Platform Extensions function suffix IntelFPGA
Platform Name Intel(R) OpenCL
Number of devices 1
Device Name 12th Gen Intel(R) Core(TM) i5-12400
Device Vendor Intel(R) Corporation
Device Vendor ID 0x8086
Device Version OpenCL 3.0 (Build 0)
Device Numeric Version 0xc00000 (3.0.0)
Driver Version 2022.15.12.0.01_081451
Device OpenCL C Version OpenCL C 3.0
Device OpenCL C all versions OpenCL C
0xc00000 (3.0.0)
OpenCL C
0x800000 (2.0.0)
OpenCL C
0x402000 (1.2.0)
OpenCL C
0x401000 (1.1.0)
OpenCL C
0x400000 (1.0.0)
Device OpenCL C features __opencl_c_3d_image_writes
0xc00000 (3.0.0)
__opencl_c_atomic_order_acq_rel
0xc00000 (3.0.0)
__opencl_c_atomic_order_seq_cst
0xc00000 (3.0.0)
__opencl_c_atomic_scope_device
0xc00000 (3.0.0)
__opencl_c_atomic_scope_all_devices 0xc00000 (3.0.0)
__opencl_c_device_enqueue
0xc00000 (3.0.0)
__opencl_c_generic_address_space
0xc00000 (3.0.0)
__opencl_c_fp64
0xc00000 (3.0.0)
__opencl_c_images
0xc00000 (3.0.0)
__opencl_c_int64
0xc00000 (3.0.0)
__opencl_c_pipes
0xc00000 (3.0.0)
__opencl_c_program_scope_global_variables 0xc00000 (3.0.0)
__opencl_c_read_write_images
0xc00000 (3.0.0)
__opencl_c_subgroups
0xc00000 (3.0.0)
__opencl_c_work_group_collective_functions 0xc00000 (3.0.0)
Latest comfornace test passed v2021-08-16-00
Device Type CPU
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 12
Max clock frequency 0MHz
Device Partition (core)
Max number of sub-devices 12
Supported partition types by counts, equally, by names (Intel)
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 8192x8192x8192
Max work group size 8192
Preferred work group size multiple (device) 128
Preferred work group size multiple (kernel) 128
Max sub-groups per work group 2048
Sub-group sizes (Intel) 4, 8, 16, 32, 64
Preferred / native vector sizes
char 1 / 32
short 1 / 16
int 1 / 8
long 1 / 4
half 0 / 0 (n/a)
float 1 / 8
double 1 / 4 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Address bits 64, Little-Endian
Global memory size 8142303232 (7.583GiB)
Error Correction support No
Max memory allocation 4071151616 (3.792GiB)
Unified memory for Host and Device Yes
Shared Virtual Memory (SVM) capabilities (core)
Coarse-grained buffer sharing Yes
Fine-grained buffer sharing Yes
Fine-grained system sharing Yes
Atomics Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Preferred alignment for atomics
SVM 64 bytes
Global 64 bytes
Local 0 bytes
Atomic memory capabilities relaxed, acquire/release, sequentially-consistent, work-group scope, device scope, all-devices scope
Atomic fence capabilities relaxed, acquire/release, sequentially-consistent, work-item scope, work-group scope, device scope, all-devices scope
Max size for global variable 65536 (64KiB)
Preferred total size of global vars 65536 (64KiB)
Global Memory cache type Read/Write
Global Memory cache size 1310720 (1.25MiB)
Global Memory cache line size 64 bytes
Image support Yes
Max number of samplers per kernel 480
Max size for 1D images from buffer 254446976 pixels
Max 1D or 2D image array size 2048 images
Base address alignment for 2D image buffers 64 bytes
Pitch alignment for 2D image buffers 64 pixels
Max 2D image size 16384x16384 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 480
Max number of write image args 480
Max number of read/write image args 480
Pipe support Yes
Max number of pipe args 16
Max active pipe reservations 21845
Max pipe packet size 1024
Local memory type Global
Local memory size 32768 (32KiB)
Max number of constant args 480
Max constant buffer size 131072 (128KiB)
Generic address space support Yes
Max size of kernel argument 3840 (3.75KiB)
Queue properties (on host)
Out-of-order execution Yes
Profiling Yes
Local thread execution (Intel) Yes
Device enqueue capabilities supported, replaceable default queue
Queue properties (on device)
Out-of-order execution Yes
Profiling Yes
Preferred size 4294967295 (4GiB)
Max size 4294967295 (4GiB)
Max queues on device 4294967295
Max events on device 4294967295
Prefer user sync for interop No
Profiling timer resolution 1ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels Yes
Non-uniform work-groups Yes
Work-group collective functions Yes
Sub-group independent forward progress No
IL version SPIR-V_1.0
ILs with version SPIR-V
0x400000 (1.0.0)
SPIR versions 1.2
printf() buffer size 1048576 (1024KiB)
Built-in kernels (n/a)
Built-in kernels with version (n/a)
Device Extensions cl_khr_spirv_linkonce_odr cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_khr_il_program cl_intel_unified_shared_memory_preview cl_intel_device_attribute_query cl_intel_subgroups cl_intel_subgroups_char cl_intel_subgroups_short cl_intel_subgroups_long cl_intel_spirv_subgroups cl_intel_required_subgroup_size cl_intel_exec_by_local_thread cl_intel_vec_len_hint cl_intel_device_partition_by_names cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer
Device Extensions with Version cl_khr_spirv_linkonce_odr
0x400000 (1.0.0)
cl_khr_icd
0x400000 (1.0.0)
cl_khr_global_int32_base_atomics
0x400000 (1.0.0)
cl_khr_global_int32_extended_atomics 0x400000 (1.0.0)
cl_khr_local_int32_base_atomics
0x400000 (1.0.0)
cl_khr_local_int32_extended_atomics 0x400000 (1.0.0)
cl_khr_int64_base_atomics
0x400000 (1.0.0)
cl_khr_int64_extended_atomics
0x400000 (1.0.0)
cl_khr_byte_addressable_store
0x400000 (1.0.0)
cl_khr_depth_images
0x400000 (1.0.0)
cl_khr_3d_image_writes
0x400000 (1.0.0)
cl_khr_il_program
0x400000 (1.0.0)
cl_intel_unified_shared_memory_preview 0x400000 (1.0.0)
cl_intel_device_attribute_query
0x400000 (1.0.0)
cl_intel_subgroups
0x400000 (1.0.0)
cl_intel_subgroups_char
0x400000 (1.0.0)
cl_intel_subgroups_short
0x400000 (1.0.0)
cl_intel_subgroups_long
0x400000 (1.0.0)
cl_intel_spirv_subgroups
0x400000 (1.0.0)
cl_intel_required_subgroup_size
0x400000 (1.0.0)
cl_intel_exec_by_local_thread
0x400000 (1.0.0)
cl_intel_vec_len_hint
0x400000 (1.0.0)
cl_intel_device_partition_by_names 0x400000 (1.0.0)
cl_khr_spir
0x400000 (1.0.0)
cl_khr_fp64
0x400000 (1.0.0)
cl_khr_image2d_from_buffer
0x400000 (1.0.0)
Platform Name Intel(R) FPGA Emulation Platform for OpenCL(TM)
Number of devices 1
Device Name Intel(R) FPGA Emulation Device
Device Vendor Intel(R) Corporation
Device Vendor ID 0x1172
Device Version OpenCL 1.2
Driver Version 2022.15.12.0.01_081451
Device OpenCL C Version OpenCL C 1.2
Device Type Accelerator
Device Profile EMBEDDED_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 12
Max clock frequency 0MHz
Device Partition (core)
Max number of sub-devices 12
Supported partition types by counts, equally, by names (Intel)
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 67108864x67108864x67108864
Max work group size 67108864
Preferred work group size multiple (kernel) 128
Preferred / native vector sizes
char 1 / 32
short 1 / 16
int 1 / 8
long 1 / 4
half 0 / 0 (n/a)
float 1 / 8
double 1 / 4 (n/a)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (n/a)
Address bits 64, Little-Endian
Global memory size 8142303232 (7.583GiB)
Error Correction support No
Max memory allocation 4071151616 (3.792GiB)
Unified memory for Host and Device Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Global Memory cache type Read/Write
Global Memory cache size 1310720 (1.25MiB)
Global Memory cache line size 64 bytes
Image support No
Local memory type Global
Local memory size 262144 (256KiB)
Max number of constant args 480
Max constant buffer size 131072 (128KiB)
Max size of kernel argument 3840 (3.75KiB)
Queue properties
Out-of-order execution Yes
Profiling Yes
Prefer user sync for interop No
Profiling timer resolution 1ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels Yes
IL version SPIR-V_1.0
printf() buffer size 1048576 (1024KiB)
Built-in kernels (n/a)
Device Extensions cl_khr_spirv_linkonce_odr cl_khr_icd cl_khr_byte_addressable_store cl_intel_fpga_host_pipe cles_khr_int64 cl_khr_il_program cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
Platform Name Intel(R) OpenCL HD Graphics
Number of devices 1
Device Name Intel(R) Graphics [0x56a1]
Device Vendor Intel(R) Corporation
Device Vendor ID 0x8086
Device Version OpenCL 3.0 NEO
Device UUID 86800000-a156-0000-0000-000000000000
Driver UUID 32322e34-332e-3234-3539-352e33350000
Valid Device LUID No
Device LUID 604c-f190fe7f0000
Device Node Mask 0
Device Numeric Version 0xc00000 (3.0.0)
Driver Version 22.43.24595.35
Device OpenCL C Version OpenCL C 1.2
Device OpenCL C all versions OpenCL C
0x400000 (1.0.0)
OpenCL C
0x401000 (1.1.0)
OpenCL C
0x402000 (1.2.0)
OpenCL C
0xc00000 (3.0.0)
Device OpenCL C features __opencl_c_int64
0xc00000 (3.0.0)
__opencl_c_3d_image_writes
0xc00000 (3.0.0)
__opencl_c_images
0xc00000 (3.0.0)
__opencl_c_read_write_images
0xc00000 (3.0.0)
__opencl_c_atomic_order_acq_rel
0xc00000 (3.0.0)
__opencl_c_atomic_order_seq_cst
0xc00000 (3.0.0)
__opencl_c_atomic_scope_all_devices 0xc00000 (3.0.0)
__opencl_c_atomic_scope_device
0xc00000 (3.0.0)
__opencl_c_generic_address_space
0xc00000 (3.0.0)
__opencl_c_program_scope_global_variables 0xc00000 (3.0.0)
__opencl_c_work_group_collective_functions 0xc00000 (3.0.0)
__opencl_c_subgroups
0xc00000 (3.0.0)
Latest comfornace test passed v2022-04-22-00
Device Type GPU
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 448
Max clock frequency 2400MHz
Device Partition (core)
Max number of sub-devices 0
Supported partition types None
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 1024x1024x1024
Max work group size 1024
Preferred work group size multiple (device) 64
Preferred work group size multiple (kernel) 64
Max sub-groups per work group 128
Sub-group sizes (Intel) 8, 16, 32
Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 1 / 1
half 8 / 8 (cl_khr_fp16)
float 1 / 1
double 1 / 1 (n/a)
Half-precision Floating-point support (cl_khr_fp16)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (n/a)
Address bits 64, Little-Endian
Global memory size 6791413760 (6.325GiB)
Error Correction support No
Max memory allocation 1073741824 (1024MiB)
Unified memory for Host and Device No
Shared Virtual Memory (SVM) capabilities (core)
Coarse-grained buffer sharing Yes
Fine-grained buffer sharing No
Fine-grained system sharing No
Atomics No
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Preferred alignment for atomics
SVM 64 bytes
Global 64 bytes
Local 64 bytes
Atomic memory capabilities relaxed, acquire/release, sequentially-consistent, work-group scope, device scope, all-devices scope
Atomic fence capabilities relaxed, acquire/release, sequentially-consistent, work-item scope, work-group scope, device scope, all-devices scope
Max size for global variable 65536 (64KiB)
Preferred total size of global vars 1073741824 (1024MiB)
Global Memory cache type Read/Write
Global Memory cache size 16777216 (16MiB)
Global Memory cache line size 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 67108864 pixels
Max 1D or 2D image array size 2048 images
Base address alignment for 2D image buffers 4 bytes
Pitch alignment for 2D image buffers 4 pixels
Max 2D image size 16384x16384 pixels
Max planar YUV image size 16384x16128 pixels
Max 3D image size 16384x16384x2048 pixels
Max number of read image args 128
Max number of write image args 128
Max number of read/write image args 128
Pipe support No
Max number of pipe args 0
Max active pipe reservations 0
Max pipe packet size 0
Local memory type Local
Local memory size 65536 (64KiB)
Max number of constant args 8
Max constant buffer size 1073741824 (1024MiB)
Generic address space support Yes
Max size of kernel argument 2048 (2KiB)
Queue properties (on host)
Out-of-order execution Yes
Profiling Yes
Device enqueue capabilities (n/a)
Queue properties (on device)
Out-of-order execution No
Profiling No
Preferred size 0
Max size 0
Max queues on device 0
Max events on device 0
Prefer user sync for interop Yes
Profiling timer resolution 52ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Non-uniform work-groups Yes
Work-group collective functions Yes
Sub-group independent forward progress No
IL version SPIR-V_1.2
ILs with version SPIR-V
0x402000 (1.2.0)
SPIR versions 1.2
printf() buffer size 4194304 (4MiB)
Built-in kernels (n/a)
Built-in kernels with version (n/a)
Device Extensions cl_khr_byte_addressable_store cl_khr_device_uuid cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_suggested_local_work_size cl_intel_split_work_group_barrier cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_bfloat16_conversions cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info cl_intel_create_buffer_with_properties cl_intel_dot_accumulate cl_intel_subgroup_local_block_io cl_intel_subgroup_matrix_multiply_accumulate cl_intel_subgroup_split_matrix_multiply_accumulate
Device Extensions with Version cl_khr_byte_addressable_store
0x400000 (1.0.0)
cl_khr_device_uuid
0x400000 (1.0.0)
cl_khr_fp16
0x400000 (1.0.0)
cl_khr_global_int32_base_atomics
0x400000 (1.0.0)
cl_khr_global_int32_extended_atomics 0x400000 (1.0.0)
cl_khr_icd
0x400000 (1.0.0)
cl_khr_local_int32_base_atomics
0x400000 (1.0.0)
cl_khr_local_int32_extended_atomics 0x400000 (1.0.0)
cl_intel_command_queue_families
0x400000 (1.0.0)
cl_intel_subgroups
0x400000 (1.0.0)
cl_intel_required_subgroup_size
0x400000 (1.0.0)
cl_intel_subgroups_short
0x400000 (1.0.0)
cl_khr_spir
0x400000 (1.0.0)
cl_intel_accelerator
0x400000 (1.0.0)
cl_intel_driver_diagnostics
0x400000 (1.0.0)
cl_khr_priority_hints
0x400000 (1.0.0)
cl_khr_throttle_hints
0x400000 (1.0.0)
cl_khr_create_command_queue
0x400000 (1.0.0)
cl_intel_subgroups_char
0x400000 (1.0.0)
cl_intel_subgroups_long
0x400000 (1.0.0)
cl_khr_il_program
0x400000 (1.0.0)
cl_intel_mem_force_host_memory
0x400000 (1.0.0)
cl_khr_subgroup_extended_types
0x400000 (1.0.0)
cl_khr_subgroup_non_uniform_vote
0x400000 (1.0.0)
cl_khr_subgroup_ballot
0x400000 (1.0.0)
cl_khr_subgroup_non_uniform_arithmetic 0x400000 (1.0.0)
cl_khr_subgroup_shuffle
0x400000 (1.0.0)
cl_khr_subgroup_shuffle_relative
0x400000 (1.0.0)
cl_khr_subgroup_clustered_reduce
0x400000 (1.0.0)
cl_intel_device_attribute_query
0x400000 (1.0.0)
cl_khr_suggested_local_work_size
0x400000 (1.0.0)
cl_intel_split_work_group_barrier
0x400000 (1.0.0)
cl_intel_spirv_media_block_io
0x400000 (1.0.0)
cl_intel_spirv_subgroups
0x400000 (1.0.0)
cl_khr_spirv_no_integer_wrap_decoration 0x400000 (1.0.0)
cl_intel_unified_shared_memory
0x400000 (1.0.0)
cl_khr_mipmap_image
0x400000 (1.0.0)
cl_khr_mipmap_image_writes
0x400000 (1.0.0)
cl_intel_planar_yuv
0x400000 (1.0.0)
cl_intel_packed_yuv
0x400000 (1.0.0)
cl_khr_int64_base_atomics
0x400000 (1.0.0)
cl_khr_int64_extended_atomics
0x400000 (1.0.0)
cl_khr_image2d_from_buffer
0x400000 (1.0.0)
cl_khr_depth_images
0x400000 (1.0.0)
cl_khr_3d_image_writes
0x400000 (1.0.0)
cl_intel_media_block_io
0x400000 (1.0.0)
cl_intel_bfloat16_conversions
0x400000 (1.0.0)
cl_intel_va_api_media_sharing
0x400000 (1.0.0)
cl_intel_sharing_format_query
0x400000 (1.0.0)
cl_khr_pci_bus_info
0x400000 (1.0.0)
cl_intel_create_buffer_with_properties 0x400000 (1.0.0)
cl_intel_dot_accumulate
0x400000 (1.0.0)
cl_intel_subgroup_local_block_io
0x400000 (1.0.0)
cl_intel_subgroup_matrix_multiply_accumulate 0x400000 (1.0.0)
cl_intel_subgroup_split_matrix_multiply_accumulate 0x400000 (1.0.0)
Platform Name Intel(R) FPGA Emulation Platform for OpenCL(TM)
Number of devices 1
Device Name Intel(R) FPGA Emulation Device
Device Vendor Intel(R) Corporation
Device Vendor ID 0x1172
Device Version OpenCL 1.2
Driver Version 2022.15.12.0.01_081451
Device OpenCL C Version OpenCL C 1.2
Device Type Accelerator
Device Profile EMBEDDED_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 12
Max clock frequency 0MHz
Device Partition (core)
Max number of sub-devices 12
Supported partition types by counts, equally, by names (Intel)
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 67108864x67108864x67108864
Max work group size 67108864
Preferred work group size multiple (kernel) 128
Preferred / native vector sizes
char 1 / 32
short 1 / 16
int 1 / 8
long 1 / 4
half 0 / 0 (n/a)
float 1 / 8
double 1 / 4 (n/a)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (n/a)
Address bits 64, Little-Endian
Global memory size 8142303232 (7.583GiB)
Error Correction support No
Max memory allocation 4071151616 (3.792GiB)
Unified memory for Host and Device Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Global Memory cache type Read/Write
Global Memory cache size 1310720 (1.25MiB)
Global Memory cache line size 64 bytes
Image support No
Local memory type Global
Local memory size 262144 (256KiB)
Max number of constant args 480
Max constant buffer size 131072 (128KiB)
Max size of kernel argument 3840 (3.75KiB)
Queue properties
Out-of-order execution Yes
Profiling Yes
Prefer user sync for interop No
Profiling timer resolution 1ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels Yes
IL version SPIR-V_1.0
printf() buffer size 1048576 (1024KiB)
Built-in kernels (n/a)
Device Extensions cl_khr_spirv_linkonce_odr cl_khr_icd cl_khr_byte_addressable_store cl_intel_fpga_host_pipe cles_khr_int64 cl_khr_il_program cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform
clCreateContext(NULL, ...) [default] No platform
clCreateContext(NULL, ...) [other] Success [INTEL]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1)
Platform Name Intel(R) OpenCL
Device Name 12th Gen Intel(R) Core(TM) i5-12400
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) Success (1)
Platform Name Intel(R) OpenCL
Device Name 12th Gen Intel(R) Core(TM) i5-12400
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
Platform Name Intel(R) OpenCL
Device Name 12th Gen Intel(R) Core(TM) i5-12400
What's your enviroment, the OS, tensorflow and ITEX version?
Some information has provided since the beginning, to make it more clearly and thorough, I will requote it below with other information that asked for.
Here is some spec of my system:
- Hardware: Intel Arc A750
- software environment: WSL2 with Ubuntu 22.04 LTS installed
- host OS: Windows 11 Pro Education 22H2
- Using a Miniconda virtual environment (with python 3.9)
I have downgraded tensorflow from version 12 to 11 as advised; my ITEX version comes with 1.1.0, installed using pip
.
btw, is there any way to turn off the Nvidia driver warnings when using ITEX?
2023-04-07 14:27:46.301928: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-07 14:27:46.501970: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-04-07 14:27:46.541368: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-04-07 14:27:46.541404: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-04-07 14:27:47.471928: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-04-07 14:27:47.472025: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2023-04-07 14:27:47.472048: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-04-07 14:27:49.624694: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2023-04-07 14:27:49.624987: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
2023-04-07 14:27:49.625094: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (DESKTOP-TEOGi): /proc/driver/nvidia/version does not exist
I found it quite troublesome and misdirecting when come to identifying my true problem.
Best Regards,
@NeoZhangJianyu Yes, you were right, I was using the default kernel.
I set the kernel from the environment with itex (tf) with
python -m ipykernel install --user --name tf --display-name "TensorFlow"
The problem still persist. I checked the json file to see if the kernel is pointing to the right path
anaconda3/envs/bin/tf/python
in my case. Both, under WSL2 and Ubuntu. I tried VSCode too because it let me directly choose the kernel but id did not work.Best regards
@mikemayuare Please help to check the version of libstc++.so both in your conda env (eg path: ~/anaconda3/envs/${your conda env name}/lib/libstdc++.so.6
) and in your system env (eg path: /usr/lib/x86_64-linux-gnu/libstdc++.so.6
). If they are different, this should be a lib conflit issue between your conda running env and your sys env. In this case, you could upgrade the libstdcxx version in your conda env by conda install -c conda-forge libstdcxx-ng
and then check whether Intel gpu devices can be detected via Jupyter notebook. Thanks.
If this does not help, could you please set the environment variables OCL_ICD_ENABLE_TRACE=1 OCL_ICD_DEBUG=2
that @guizili0 mentioned, and share the output of running the workload? (If you are in a jupyter nootbook, you can execute %env OCL_ICD_ENABLE_TRACE=1
and %env OCL_ICD_DEBUG=2
in a cell before your workload cell.)
Sorry for the delay.
@wangkl2 as you suspected, it was a library conflict issue.
Thanks to all for the great support.
@teogi Sorry repond delay.
From you clinof output, the device is OK, so the GPU driver and OCL driver are OK.
Number of devices 1 Device Name Intel(R) Graphics [0x56a1] Device Vendor Intel(R) Corporation Device Vendor ID 0x8086 Device Version OpenCL 3.0 NEO
While your example fail to get the device.
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-03-31 10:24:11.236647: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform XPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2023-03-31 10:24:11.236698: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:XPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: XPU, pci bus id:
) Segmentation fault
So it is likey some libray confiliction at APP/example level. The problme looks similar to that mikemayuare ran into. Please try @wangkl2 suggestions. I copied here:
Please check the version of libstc++.so both in your conda env (eg path: ~/anaconda3/envs/${your conda env name}/lib/libstdc++.so.6) and in your system env (eg path: /usr/lib/x86_64-linux-gnu/libstdc++.so.6). If they are different, this should be a lib conflit issue between your conda running env and your sys env. In this case, you could upgrade the libstdcxx version in your conda env by conda install -c conda-forge libstdcxx-ng and then check whether Intel gpu devices can be detected via Jupyter notebook. Thanks.
If this does not help, could you please set the environment variables OCL_ICD_ENABLE_TRACE=1 OCL_ICD_DEBUG=2 that @guizili0 mentioned, and share the output of running the workload? (If you are in a jupyter nootbook, you can execute %env OCL_ICD_ENABLE_TRACE=1 and %env OCL_ICD_DEBUG=2 in a cell before your workload cell.)
Thanks!
btw, is there any way to turn off the Nvidia driver warnings when using ITEX? @teogi
We are sorry for this. Google Tensorflow output this as the default GPU is Nvidia GPU. There is no setting to turn it off in ITEX. I'll share you the recent info next week.
Thanks!
So it is likey some libray confiliction at APP/example level. The problme looks similar to that mikemayuare ran into. Please try @wangkl2 suggestions. I copied here:
- Please check the version of libstc++.so both in your conda env (eg path: ~/anaconda3/envs/${your conda env name}/lib/libstdc++.so.6) and in your system env (eg path: /usr/lib/x86_64-linux-gnu/libstdc++.so.6). If they are different, this should be a lib conflit issue between your conda running env and your sys env. In this case, you could upgrade the libstdcxx version in your conda env by conda install -c conda-forge libstdcxx-ng and then check whether Intel gpu devices can be detected via Jupyter notebook. Thanks.
- If this does not help, could you please set the environment variables OCL_ICD_ENABLE_TRACE=1 OCL_ICD_DEBUG=2 that @guizili0 mentioned, and share the output of running the workload? (If you are in a jupyter nootbook, you can execute %env OCL_ICD_ENABLE_TRACE=1 and %env OCL_ICD_DEBUG=2 in a cell before your workload cell.)
Thanks!
The conda libraries have the lib conflicts as you mentioned.
By locating all the libstdc++.so.6
, I found that version sys env was libstdc++.so.6.30
but in my conda, there was libstdc++.so.6.29
The problem is the default version for anaconda is libstdcxx-ng.11.2.0
, but the version installed through conda-forge
channel will be 12.2.0
.
Solved the lib conflict problem by rebuild the environment through conda create ... -c conda-forge
, but the problem still exists.
The output after the configuration suggested:
$ OCL_ICD_ENABLE_TRACE=1 OCL_ICD_DEBUG=2
$ python quick_example.py
2023-04-11 18:12:06.419794: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-11 18:12:06.499820: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-04-11 18:12:06.502231: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-04-11 18:12:06.502269: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-04-11 18:12:06.848352: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-04-11 18:12:06.848445: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2023-04-11 18:12:06.848469: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-04-11 18:12:07.582835: I itex/core/devices/gpu/itex_gpu_runtime.cc:129] Selected platform: Intel(R) Level-Zero
2023-04-11 18:12:07.583166: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
2023-04-11 18:12:07.583377: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2023-04-11 18:12:07.583405: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
2023-04-11 18:12:07.583436: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (DESKTOP-TEOGi): /proc/driver/nvidia/version does not exist
2023-04-11 18:12:07.668640: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-11 18:12:07.670495: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform XPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-04-11 18:12:07.670546: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:XPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: XPU, pci bus id: <undefined>)
Segmentation fault
also, there is some differences in the version of stdc++.so.6
between sys env, conda env, and also oneapi environment.
locate libstdc++.so.6 -e
/home/teogi/miniconda3/envs/intel-gpu-ml/lib/libstdc++.so.6
...
/home/teogi/miniconda3/lib/libstdc++.so.6
/home/teogi/miniconda3/lib/libstdc++.so.6.0.30
/home/teogi/miniconda3/pkgs/libstdcxx-ng-11.2.0-h1234567_1/lib/libstdc++.so.6
/home/teogi/miniconda3/pkgs/libstdcxx-ng-11.2.0-h1234567_1/lib/libstdc++.so.6.0.29
/home/teogi/miniconda3/pkgs/libstdcxx-ng-12.2.0-h46fd767_19/lib/libstdc++.so.6
/home/teogi/miniconda3/pkgs/libstdcxx-ng-12.2.0-h46fd767_19/lib/libstdc++.so.6.0.30
/opt/intel/oneapi/advisor/2023.0.0/lib32/libstdc++.so.6
/opt/intel/oneapi/advisor/2023.0.0/lib32/libstdc++.so.6.0.22
/opt/intel/oneapi/advisor/2023.0.0/lib64/libstdc++.so.6
/opt/intel/oneapi/advisor/2023.0.0/lib64/libstdc++.so.6.0.22
/opt/intel/oneapi/advisor/2023.0.0/lib64/gtpin/libstdc++.so.6
/opt/intel/oneapi/vtune/2023.0.0/lib32/libstdc++.so.6
/opt/intel/oneapi/vtune/2023.0.0/lib32/libstdc++.so.6.0.22
/opt/intel/oneapi/vtune/2023.0.0/lib64/libstdc++.so.6
/opt/intel/oneapi/vtune/2023.0.0/lib64/libstdc++.so.6.0.22
/opt/intel/oneapi/vtune/2023.0.0/lib64/gtpin/libstdc++.so.6
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30
/usr/share/gdb/auto-load/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30-gdb.py
The version at oneapi's env 6.0.22
is lower than sys env 6.0.30
. Should I following the version at oneapi's env and downgrade the libstdc++.so
?
Hi @teogi ,
Sorry for the inconvenience. There is a workaound for the lib conflicts, to preload the correct library manually.
1) LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30 python quick_example.py or export LD_PRELOAD 2) export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30:$LD_PRELOAD
If it helps, I installed python==3.10
and tensorflow==2.11.0
. Its able to detect both dedicated (Arc 770) and integrated (UHD Graphics 770) GPU. But I am not able to load the model on the device.
@Ankur-singh, could you please try ZE_AFFINITY_MASK =1 python your_test.py ? This will mask the iGPU.
I close the issues , but please feel free to reopen if any updates.
FYI, there is new version for your future test: https://intel.github.io/intel-extension-for-tensorflow/latest/docs/install/experimental/install_for_arc_gpu.html
Hi, I am facing an issue with the system identifying the Intel Arc GPU following the manual here: Experimental: Intel® Arc™ A-Series GPU Software Installation
Here is some spec of my system:
The issue that I have been facing since now is the last step
python -c "import intel_extension_for_tensorflow as itex; print(itex.__version__)"
which outputfrom the output, it seem that although I install the intel-extension for tensorflow , the output still requires CUDA driver.
By using the
intel_extension_for_tensorflow/tools/env_check.sh
in this github repo, I've passed all the tests provided, which means no problem detected over my system and dependency.here might be some other useful information provided by my system:
hwinfo
08: PCI 957e0000.0: 0302 3D controller [Created at pci.386] Unique ID: GYvN.TMx8hlOLi40 SysFS ID: /devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0004:00/VMBUS:00/898507b1-957e-45b1-a1bc-d9a7617e1f30/pci957e:00/957e:00:00.0 SysFS BusID: 957e:00:00.0 Hardware Class: graphics card Model: "Microsoft 3D controller" Vendor: pci 0x1414 "Microsoft Corporation" Device: pci 0x008e Driver: "dxgkrnl" Driver Modules: "dxgkrnl", "dxgkrnl" Module Alias: "pci:v00001414d0000008Esv00000000sd00000000bc03sc02i00" Config Status: cfg=new, avail=yes, need=no, active=unknown
Primary display adapter: #6