Closed djsv23 closed 6 months ago
Can you please try the driver version suggested here : https://github.com/intel/intel-extension-for-tensorflow/blob/main/docs/install/install_for_xpu.md#install-gpu-drivers
Hi @srinarayan-srikanthan, this error is produced when I have used the driver suggested on that page by following the installation instructions linked to https://github.com/intel/intel-extension-for-tensorflow/blob/main/docs/install/experimental/install_for_arc_gpu.md. Is there another procedure I should use to install the recommended driver version?
Hi @djsv23 , yes you are looking into the rite page, but the version you have installed is stable_775_20_20231219 instead of stable_736_25_2023103. The instruction on the page specifies the versions as below
But the output of your env_check script is not showing the rite versions. Can you please check on that.
Would you mind adding the link to the instructions page where this version list is found? I'm having a hard time navigating and understanding which are the current instructions as https://dgpu-docs.intel.com/driver/client/overview.html does not have them, and is the only driver installation instruction page I can find that does not have a deprecation notice.
I see also this pinned version list in the Ubuntu 22.04 for WSL instructions in this repository, but the same instructions are not given for Ubuntu 22.04 on bare metal, which is how my environment is configured.
That said, I have installed the mentioned versions and still see an output from env_check.sh which suggests that required drivers are missing.
` Check Environment for Intel(R) Extension for TensorFlow*...
======================== Check Python ========================
python3.10 is installed.
==================== Check Python Passed =====================
========================== Check OS ==========================
OS ubuntu:22.04 is Supported.
====================== Check OS Passed =======================
====================== Check Tensorflow ======================
Tensorflow2.14 is installed.
================== Check Tensorflow Passed ===================
=================== Check Intel GPU Driver ===================
Intel(R) graphics runtime intel-level-zero-gpu-1.3.26918.50-736 is installed, but is not recommended . Intel(R) graphics runtime intel-opencl-icd-23.30.26918.50-736 is installed, but is not recommended . Intel(R) graphics runtime level-zero-1.13.1-719 is installed, but is not recommended . Intel(R) graphics runtime libigc1-1.0.14828.26-736 is installed, but is not recommended . Intel(R) graphics runtime libigdfcl1-1.0.14828.26-736 is installed, but is not recommended . Intel(R) graphics runtime libigdgmm12-22.3.10-712 is installed, but is not recommended .
=============== Check Intel GPU Driver Finshed ================
===================== Check Intel oneAPI =====================
Intel(R) oneAPI DPC++/C++ Compiler is installed. Intel(R) oneAPI Math Kernel Library is installed.
================= Check Intel oneAPI Passed ==================
========================== Check Devices Availability ==========================
2024-02-07 12:30:08.568077: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0
.
2024-02-07 12:30:08.569821: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-07 12:30:08.590078: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-07 12:30:08.590106: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-07 12:30:08.590132: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-07 12:30:08.595032: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-07 12:30:08.595173: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-02-07 12:30:08.996531: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-02-07 12:30:09.309113: I itex/core/wrapper/itex_cpu_wrapper.cc:60] Intel Extension for Tensorflow AVX512 CPU backend is loaded.
2024-02-07 12:30:09.681137: I itex/core/wrapper/itex_gpu_wrapper.cc:35] Intel Extension for Tensorflow GPU backend is loaded.
2024-02-07 12:30:09.745557: I itex/core/devices/gpu/itex_gpu_runtime.cc:129] Selected platform: Intel(R) Level-Zero
2024-02-07 12:30:09.745840: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
2024-02-07 12:30:10.068782: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
====================== Check Devices Availability Passed ======================= `
So from your env_check.sh I see "itex/core/wrapper/itex_gpu_wrapper.cc:35] Intel Extension for Tensorflow* GPU backend is loaded.", so it does load the backend, what is the error you are facing when you try to run the following command python -c "import intel_extension_for_tensorflow as itex; print(itex.version)" ?
With regard to the other question, you can find instructions here : https://github.com/intel/intel-extension-for-tensorflow/blob/main/docs/install/experimental/install_for_arc_gpu.md#native-linux-running-directly-on-hardware-1
The error I am getting is that despite the GPU backend being loaded, it says that GPU will not be used and cannot find a CUDA-capable device.
"2024-02-08 09:57:04.630925: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used."
2024-02-08 09:57:14.392825: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
Okay can you post the complete output please of the import command.
python -c "import intel_extension_for_tensorflow as itex; print(itex.version)"
2024-02-08 09:57:04.527315: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0
.
2024-02-08 09:57:04.630925: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-08 09:57:05.096549: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-08 09:57:05.096579: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-08 09:57:05.098091: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-08 09:57:05.324901: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-08 09:57:05.328626: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-02-08 09:57:06.660042: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-02-08 09:57:07.789687: I itex/core/wrapper/itex_cpu_wrapper.cc:60] Intel Extension for Tensorflow AVX512 CPU backend is loaded.
2024-02-08 09:57:12.504640: I itex/core/wrapper/itex_gpu_wrapper.cc:35] Intel Extension for Tensorflow GPU backend is loaded.
2024-02-08 09:57:12.947637: I itex/core/devices/gpu/itex_gpu_runtime.cc:129] Selected platform: Intel(R) Level-Zero
2024-02-08 09:57:12.947946: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
2024-02-08 09:57:14.392825: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
Traceback (most recent call last):
File "
Okay, i see GPU backed being loaded and then failing, can you paste output of conda list | grep tensorflow ?
(itex) user@host:~$ conda list | grep tensorflow intel-extension-for-tensorflow 2.14.0.2 pypi_0 pypi tensorflow-datasets 4.9.4 pypi_0 pypi tensorflow-metadata 1.14.0 pypi_0 pypi
are these all the packages, did you install tensorflow before installing intel-extension-for-tensorflow ?
This is the entire output - at one point I think I had removed and reinstalled intel-extension-for-tensorflow in this conda environment. Should I try setting up a new one from scratch?
Yes please try creating the environment from scratch following the instructions starting from here : https://github.com/intel/intel-extension-for-tensorflow/blob/main/docs/install/experimental/install_for_arc_gpu.md#2-install-tensorflow-via-pypi-wheel-in-linux
Ok. I've torn everything down, removed miniconda, spun up a new python 3.10 virtual environment and get this output now. It no longer fails to import the module, but still is not using the GPU.
pip list | grep tensorflow intel-extension-for-tensorflow 2.14.0.2 intel-extension-for-tensorflow-lib 2.14.0.2.2 tensorflow 2.14.0 tensorflow-estimator 2.14.0 tensorflow-io-gcs-filesystem 0.36.0
python -c "import intel_extension_for_tensorflow as itex; print(itex.version)"
2024-02-08 13:03:04.514441: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0
.
2024-02-08 13:03:04.516194: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-08 13:03:04.546696: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-08 13:03:04.546724: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-08 13:03:04.546756: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-08 13:03:04.552904: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-08 13:03:04.553073: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-02-08 13:03:05.181948: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-02-08 13:03:05.422317: I itex/core/wrapper/itex_cpu_wrapper.cc:60] Intel Extension for Tensorflow AVX512 CPU backend is loaded.
2024-02-08 13:03:06.946692: I itex/core/wrapper/itex_gpu_wrapper.cc:35] Intel Extension for Tensorflow GPU backend is loaded.
2024-02-08 13:03:07.024321: I itex/core/devices/gpu/itex_gpu_runtime.cc:129] Selected platform: Intel(R) Level-Zero
2024-02-08 13:03:07.024636: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
2024-02-08 13:03:07.656850: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2.14.0.2
I see the GPU backend being loaded, can you try itex.get_backend() after importing it. Or list physical devices and check. It should now be using the GPU.
I checked the physical device and it is present (I am using the GPU for desktop video output, so no surprise there)
xpu-smi discovery +-----------+--------------------------------------------------------------------------------------+ | Device ID | Device Information | +-----------+--------------------------------------------------------------------------------------+ | 0 | Device Name: Intel(R) Arc(TM) A750 Graphics | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-0003-0000-000856a18086 | | | PCI BDF Address: 0000:03:00.0 | | | DRM Device: /dev/dri/card0 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+
So then I try generating an image with keras_cv as per the intel tutorial at https://medium.com/intel-analytics-software/running-tensorflow-stable-diffusion-on-intel-arc-gpus-e6ff0d2b7549 and we get this output:
I also get this in the logs on the Jupyter server:
2024-02-09 11:23:11.419551: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0
.
2024-02-09 11:23:11.420929: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-09 11:23:11.440019: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-09 11:23:11.440038: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-09 11:23:11.440066: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-09 11:23:11.444734: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-09 11:23:11.445029: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-02-09 11:23:11.838733: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-02-09 11:23:12.060294: I itex/core/wrapper/itex_cpu_wrapper.cc:60] Intel Extension for Tensorflow AVX512 CPU backend is loaded.
2024-02-09 11:23:12.752293: I itex/core/wrapper/itex_gpu_wrapper.cc:35] Intel Extension for Tensorflow GPU backend is loaded.
2024-02-09 11:23:12.828676: I itex/core/devices/gpu/itex_gpu_runtime.cc:129] Selected platform: Intel(R) Level-Zero
2024-02-09 11:23:12.828995: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
2024-02-09 11:23:13.180820: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2024-02-09 11:24:39.753366: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform XPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2024-02-09 11:24:39.753389: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:XPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: XPU, pci bus id:
End result is that it seems to be not finding the device and trying to proceed anyway. It finds that 0 VRAM is insufficient and fails to generate an image.
It is able to detect the device and loading it because I see XPU being enabled "2024-02-09 11:24:42.064980: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:117] Plugin optimizer for device_type XPU is enabled. ". Can you monitor the GPU and check its use/mem usage and output of xpu-smi
@srinarayan-srikanthan I have done this; running the computer headless and accessing remotely, intel_gpu_top shows that the GPU is on standby at 0% usage and 0 mhz clock speed. I'm further convinced the GPU is not used, because no image is generated. Images[] is empty so when we run plt.imshow(images[0]) there is no output.
What is the output of xpu-smi ?
It is the same as before. Is there another subcommand that would be helpful to see?
xpu-smi discovery +-----------+--------------------------------------------------------------------------------------+ | Device ID | Device Information | +-----------+--------------------------------------------------------------------------------------+ | 0 | Device Name: Intel(R) Arc(TM) A750 Graphics | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-0003-0000-000856a18086 | | | PCI BDF Address: 0000:03:00.0 | | | DRM Device: /dev/dri/card0 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+
Here are two suggestions, can you try reducing the size of the image and running it? The tutorial you shared was for Arc 770 which comes with 16GB whereas 750 is equipped with 8 GB. Also can you try the below version of the model : https://github.com/intel/intel-extension-for-tensorflow/tree/main/examples/stable_diffussion_inference
@djsv23 , We run into another similar issue recently: https://github.com/intel/intel-extension-for-transformers/issues/1276. Not sure if it helps. Just for your reference,
could you please try below in the terminal environment first? (after it works, then try jupyter-notebook)
1) conda activate your env
2) source /opt/intel/oneapi/setvars.sh
3) groups
4) sycl-ls ( the devices show ?)
5) env_check.sh ( maybe same error)
6) check the libstd++ libraries, and delete (rename) the libstd* under your conda environment.
for example, mine itex214 env:
(itex214) yhu5@arc770-tce:~$ ls /home/yhu5/miniconda3/envs/itex214/lib/libstd /home/yhu5/miniconda3/envs/itex214/lib/libstdc++.so /home/yhu5/miniconda3/envs/itex214/lib/libstdc++.so.6.0.29 /home/yhu5/miniconda3/envs/itex214/lib/libstdc++.so.6 (itex214) yhu5@arc770-tce:~$ ls /usr/lib/x86_64-linux-gnu/libstd /usr/lib/x86_64-linux-gnu/libstdc++.so.6 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30
7) env_check.sh again.
and let us know the screen output.
thanks
Also can you try the below version of the model : https://github.com/intel/intel-extension-for-tensorflow/tree/main/examples/stable_diffussion_inference
There are a number of issues I'm seeing with this set of instructions:
@yinghu5 Here is the output from the terminal before attempting the Jupyter notebook:
:: initializing oneAPI environment ...
-bash: BASH_VERSION = 5.1.16(1)-release
args: Using "$@" for setvars.sh arguments: --force
:: advisor -- latest
:: ccl -- latest
:: compiler -- latest
:: dal -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: dnnl -- latest
:: dpcpp-ct -- latest
:: dpl -- latest
:: ipp -- latest
:: ippcp -- latest
:: mkl -- latest
:: mpi -- latest
:: tbb -- latest
:: vtune -- latest
:: oneAPI environment initialized ::
(itex2) user@host:~/intel-extension-for-tensorflow/examples/stable_diffussion_inference$ groups
zebra adm tty cdrom sudo dip video plugdev kvm ssl-cert lpadmin sambashare render libvirt boinc
(itex2) user@host:~/intel-extension-for-tensorflow/examples/stable_diffussion_inference$ sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.10.0.17_160000]
[opencl:cpu:1] Intel(R) OpenCL, AMD Ryzen 7 7700X 8-Core Processor OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000]
[opencl:acc:2] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:gpu:3] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A750 Graphics OpenCL 3.0 NEO [23.35.27191.42]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A750 Graphics 1.3 [1.3.27191]
(itex2) user@host:~/itex2/lib/python3.10/site-packages/intel_extension_for_tensorflow/tools$ ./env_check.sh
Check Environment for Intel(R) Extension for TensorFlow*...
======================== Check Python ========================
python3.10 is installed.
==================== Check Python Passed =====================
========================== Check OS ==========================
OS ubuntu:22.04 is Supported.
====================== Check OS Passed =======================
====================== Check Tensorflow ======================
Tensorflow2.14 is installed.
================== Check Tensorflow Passed ===================
=================== Check Intel GPU Driver ===================
Intel(R) graphics runtime intel-level-zero-gpu-1.3.27191.42-775 is installed, but is not recommended .
Intel(R) graphics runtime intel-opencl-icd-23.35.27191.42-775 is installed, but is not recommended .
Intel(R) graphics runtime level-zero-1.14.0-744 is installed, but is not recommended .
Intel(R) graphics runtime libigc1-1.0.15136.24-775 is installed, but is not recommended .
Intel(R) graphics runtime libigdfcl1-1.0.15136.24-775 is installed, but is not recommended .
Intel(R) graphics runtime libigdgmm12-22.3.12-742 is installed, but is not recommended .
=============== Check Intel GPU Driver Finshed ================
===================== Check Intel oneAPI =====================
Intel(R) oneAPI DPC++/C++ Compiler is installed.
Intel(R) oneAPI Math Kernel Library is installed.
================= Check Intel oneAPI Passed ==================
========================== Check Devices Availability ==========================
2024-02-20 09:11:28.723803: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-02-20 09:11:28.725558: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-20 09:11:28.747033: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-20 09:11:28.747062: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-20 09:11:28.747089: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-20 09:11:28.751889: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-20 09:11:28.752044: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-02-20 09:11:29.283384: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-02-20 09:11:29.592753: I itex/core/wrapper/itex_cpu_wrapper.cc:60] Intel Extension for Tensorflow* AVX512 CPU backend is loaded.
2024-02-20 09:11:30.524508: I itex/core/wrapper/itex_gpu_wrapper.cc:35] Intel Extension for Tensorflow* GPU backend is loaded.
2024-02-20 09:11:30.598371: I itex/core/devices/gpu/itex_gpu_runtime.cc:129] Selected platform: Intel(R) Level-Zero
2024-02-20 09:11:30.598671: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
2024-02-20 09:11:30.924933: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
====================== Check Devices Availability Passed =======================
(itex2) user@host:/usr/lib/x86_64-linux-gnu$ ls old.libstd*
old.libstdc++.so.6 old.libstdc++.so.6.0.30 old.libstd-f73d1def252dd6f1.so
(itex2) user@host:/usr/lib/x86_64-linux-gnu$ ls libstd*
ls: cannot access 'libstd*': No such file or directory
(itex2) user@host:~/itex2/lib/python3.10/site-packages/intel_extension_for_tensorflow/tools$ ./env_check.sh
Check Environment for Intel(R) Extension for TensorFlow*...
======================== Check Python ========================
python3.10 is installed.
==================== Check Python Passed =====================
========================== Check OS ==========================
OS ubuntu:22.04 is Supported.
====================== Check OS Passed =======================
====================== Check Tensorflow ======================
Tensorflow2.14 is installed.
================== Check Tensorflow Passed ===================
=================== Check Intel GPU Driver ===================
Intel(R) graphics runtime intel-level-zero-gpu-1.3.27191.42-775 is installed, but is not recommended .
Intel(R) graphics runtime intel-opencl-icd-23.35.27191.42-775 is installed, but is not recommended .
Intel(R) graphics runtime level-zero-1.14.0-744 is installed, but is not recommended .
Intel(R) graphics runtime libigc1-1.0.15136.24-775 is installed, but is not recommended .
Intel(R) graphics runtime libigdfcl1-1.0.15136.24-775 is installed, but is not recommended .
Intel(R) graphics runtime libigdgmm12-22.3.12-742 is installed, but is not recommended .
=============== Check Intel GPU Driver Finshed ================
===================== Check Intel oneAPI =====================
Intel(R) oneAPI DPC++/C++ Compiler is installed.
Intel(R) oneAPI Math Kernel Library is installed.
================= Check Intel oneAPI Passed ==================
========================== Check Devices Availability ==========================
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/zebra/itex2/lib/python3.10/site-packages/tensorflow/__init__.py", line 38, in <module>
from tensorflow.python.tools import module_util as _module_util
File "/home/zebra/itex2/lib/python3.10/site-packages/tensorflow/python/__init__.py", line 36, in <module>
from tensorflow.python import pywrap_tensorflow as _pywrap_tensorflow
File "/home/zebra/itex2/lib/python3.10/site-packages/tensorflow/python/pywrap_tensorflow.py", line 26, in <module>
self_check.preload_check()
File "/home/zebra/itex2/lib/python3.10/site-packages/tensorflow/python/platform/self_check.py", line 63, in preload_check
from tensorflow.python.platform import _pywrap_cpu_feature_guard
ImportError: libstdc++.so.6: cannot open shared object file: No such file or directory
You have multiple libstdc++.so.6, make sure you are using the correct one.
/usr/lib/i386-linux-gnu/libstdc++.so.6.0.30.
Enable OCL_ICD_ENABLE_TRACE=1 OCL_ICD_DEBUG=2 to obtain detail information when using Intel® Extension for TensorFlow*.
====================== Check Devices Availability Failed =======================
@srinarayan-srikanthan Running the instructions in the sample file gives again the same error of no CUDA-capable device detected and fails to generate any images:
The error you are referring to in this "2024-02-09 11:23:13.180820: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected" is normal behavior. TensorFlow package has cuda support by default, so they try to launch cuda.
The reason for the image not being created could be the memory issue. Can you try running any other workload and see if memory is the issue.
Thank you for the suggestion, will update the Readme with instructions for patch file.
@djsv23 thank you a lot for checking. Then the libstdc++ library in your environment is correct. Please change back, the problem is not related to it.
As Sri mentioned how about other workload like the hello-world.py : just download the py form https://github.com/oneapi-src/oneAPI-samples/tree/master/AI-and-Analytics/Getting-Started-Samples/IntelTensorFlow_GettingStarted
and run $python TensorFlow_HelloWorld.py (no code change need for tensorflow run on CPU and intel GPU)
About the example of SD, could you please also show the conda list and pip list ? and check the Keras version, if it is > 3.0, please change to old version like 2.14 and try again.
tensorflow 2.14.1 requires keras<2.15,>=2.14.0
thank you!
The error you are referring to in this "2024-02-09 11:23:13.180820: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected" is normal behavior. TensorFlow package has cuda support by default, so they try to launch cuda.
The reason for the image not being created could be the memory issue. Can you try running any other workload and see if memory is the issue.
Thank you for the suggestion, will update the Readme with instructions for patch file.
I tried with a smaller image size, 256x256 and it still get the same issue. When I run the text to image command it is giving an error that says 0MB memory is available and can't identify the PCI bus where the ARC gpu is located. After the error is raised then the python kernel dies and automatically resets. It seems to me that this is presenting as insufficient memory because the program cannot access the card, it's VRAM, or both.
``> @djsv23 thank you a lot for checking. Then the libstdc++ library in your environment is correct. Please change back, the problem is not related to it.
As Sri mentioned how about other workload like the hello-world.py : just download the py form https://github.com/oneapi-src/oneAPI-samples/tree/master/AI-and-Analytics/Getting-Started-Samples/IntelTensorFlow_GettingStarted
and run $python TensorFlow_HelloWorld.py (no code change need for tensorflow run on CPU and intel GPU)
About the example of SD, could you please also show the conda list and pip list ? and check the Keras version, if it is > 3.0, please change to old version like 2.14 and try again. tensorflow 2.14.1 requires keras<2.15,>=2.14.0
thank you!
I ran the HelloWorld script which blew up with an error. The repository suggested to run diagnostics.py from the OneAPI toolkit so I installed that and ran it to get this report:
Default checks will be run. For information on how to run other checks, see 'python3 diagnostics.py --help'
===============
Checks results:
================================================================================================================================
Check name: user_group_check
Description: This check verifies that the current user is in the same group as the GPU(s).
Result status: PASS
================================================================================================================================
================================================================================================================================
Check name: driver_compatibility_check
Description: This check verifies compatibility of oneAPI products versions and GPU drivers versions.
Result status: FAIL
Installed version of OpenCL™ may not be compatible with the version of the Intel® oneAPI Deep Neural Network Library. Recommended version of OpenCL™ is 23.26.26690
Installed version of OpenCL™ may not be compatible with the version of the Intel® DPC++ Compatibility Tool. Recommended version of OpenCL™ is 23.26.26690
Installed version of OpenCL™ may not be compatible with the version of the Intel® oneAPI DPC++ Library. Recommended version of OpenCL™ is 23.26.26690
Installed version of OpenCL™ may not be compatible with the version of the Intel® oneAPI Math Kernel Library. Recommended version of OpenCL™ is 23.26.26690
Installed version of OpenCL™ may not be compatible with the version of the Intel® MPI Library. Recommended version of OpenCL™ is 23.26.26690
Installed version of OpenCL™ may not be compatible with the version of the Intel® oneAPI Threading Building Blocks. Recommended version of OpenCL™ is 23.26.26690
================================================================================================================================
================================================================================================================================
Check name: oneapi_toolkit_check
Description: This check shows information about installed oneAPI toolkits.
Result status: PASS
================================================================================================================================
================================================================================================================================
Check name: gpu_backend_check
Description: This check shows information from OpenCL™ and Intel® oneAPI Level Zero drivers.
Result status: ERROR
Intel® oneAPI Level Zero driver is not initialized.
================================================================================================================================
================================================================================================================================
Check name: intel_gpu_detector_check
Description: This check shows which Intel GPU(s) is on the system, based on lspci information and internal table.
Result status: ERROR
Unable to get information about initialized devices because the user does not have read access to /sys/kernel/debug/dri/.
================================================================================================================================
================================================================================================================================
Check name: oneapi_env_check
Description: This check shows if the oneAPI environment is configured and provides a list of oneAPI components with their versions if they are present in the environment
Result status: PASS
================================================================================================================================
================================================================================================================================
Check name: compiler_check
Description: This check shows information about the GCC compiler.
Result status: PASS
================================================================================================================================
7 CHECKS: 4 PASS, 1 FAIL, 0 WARNINGS, 2 ERRORS
Seeing the one read access error, I ran again as root:
/opt/intel/oneapi/diagnostics/2024.0/opt/diagnostics$ sudo python3 diagnostics.py
Default checks will be run. For information on how to run other checks, see 'python3 diagnostics.py --help'
===============
Checks results:
================================================================================================================================
Check name: user_group_check
Description: This check verifies that the current user is in the same group as the GPU(s).
Result status: PASS
Root user does not need to be in groups to have access to devices. The root user always has access to devices.
================================================================================================================================
================================================================================================================================
Check name: driver_compatibility_check
Description: This check verifies compatibility of oneAPI products versions and GPU drivers versions.
Result status: FAIL
Installed version of OpenCL™ may not be compatible with the version of the Intel® oneAPI Deep Neural Network Library. Recommended version of OpenCL™ is 23.26.26690
Installed version of OpenCL™ may not be compatible with the version of the Intel® DPC++ Compatibility Tool. Recommended version of OpenCL™ is 23.26.26690
Installed version of OpenCL™ may not be compatible with the version of the Intel® oneAPI DPC++ Library. Recommended version of OpenCL™ is 23.26.26690
Installed version of OpenCL™ may not be compatible with the version of the Intel® oneAPI Math Kernel Library. Recommended version of OpenCL™ is 23.26.26690
Installed version of OpenCL™ may not be compatible with the version of the Intel® MPI Library. Recommended version of OpenCL™ is 23.26.26690
Installed version of OpenCL™ may not be compatible with the version of the Intel® oneAPI Threading Building Blocks. Recommended version of OpenCL™ is 23.26.26690
================================================================================================================================
================================================================================================================================
Check name: oneapi_toolkit_check
Description: This check shows information about installed oneAPI toolkits.
Result status: PASS
================================================================================================================================
================================================================================================================================
Check name: gpu_backend_check
Description: This check shows information from OpenCL™ and Intel® oneAPI Level Zero drivers.
Result status: WARNING
Unknown internal error: 0x.78000003
================================================================================================================================
================================================================================================================================
Check name: intel_gpu_detector_check
Description: This check shows which Intel GPU(s) is on the system, based on lspci information and internal table.
Result status: ERROR
[Errno 1] Operation not permitted: '/sys/kernel/debug/dri/0/i915_gpu_info'
================================================================================================================================
================================================================================================================================
Check name: oneapi_env_check
Description: This check shows if the oneAPI environment is configured and provides a list of oneAPI components with their versions if they are present in the environment
Result status: FAIL
oneAPI environment not configured.
================================================================================================================================
================================================================================================================================
Check name: compiler_check
Description: This check shows information about the GCC compiler.
Result status: PASS
================================================================================================================================
7 CHECKS: 3 PASS, 2 FAIL, 1 WARNING, 1 ERROR
@srinarayan-srikanthan @yinghu5 This may be of interest to both, but I did notice the mention if i915_gpu_info. When I go back through the driver installation instructions, it says there is an option to install the out-of-tree driver modules, including intel_i915_dkms among others. I was able to install the others, but the i915_dkms package appears to be incompatible with the 6.5 kernel I am running. The rest of the documentation suggests that 6.x kernels do not require the out of tree drivers; is it possible that there is something in the out-of-tree package that is required for the intel tensorflow extension that has not been upstreamed?
@djsv23 thank you a lot! then we are back to the driver again :)
Before you go ahead reinstall driver, system etc, let's check if your A750 work or not:
$ lspci
find your A750's line: with " 56A1"
for example, mine is
03:00.0 VGA compatible controller: Intel Corporation Device 56a0 (rev 08)
then
$ lspci -s xxx slot -vvv
for example,
(itex214) yhu5@arc770-tce:~$ lspci -s 03:00.0 -vvv
03:00.0 VGA compatible controller: Intel Corporation Device 56a0 (rev 08) (prog-if 00 [VGA controller])
Subsystem: Device 1ef7:1307
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
on the other hand, System BIOS configuration can have a significant impact on your GPU:
https://www.intel.com/content/www/us/en/support/articles/000091128/graphics.html
· Above 4G Decoding -> Enabled · Re-Size BAR Support -> Enabled
I'm not sure if they are default on your system, but here is reference. https://www.digitaltrends.com/computing/how-to-use-rebar-on-arc-gpus/#dt-heading-how-to-enable-rebar-on-intel-arc-gpus
Thanks. I am sure the GPU is working as I have been gaming with it since day 1. Resizeable BAR and above 4G decoding are enabled. I also had to disable the iGPU on my 7700x cpu in BIOS to stop Proton for Steam from using it instead of the ARC card so all display output is definitely using the Intel GPU
Also I have been doing video encoding with the card and see that FFMPEG is able to access the video engine and VRAM
@djsv23 nice to know!
how was the output in this machine?
dpkg -l | grep intel
and lspci -s 03:00.0 -vvv?
ii intel-basekit 2024.0.1-43 amd64 Intel® oneAPI Base Toolkit
ii intel-basekit-env-2024.0 2024.0.1-43 all Intel® oneAPI Base Toolkit
ii intel-basekit-getting-started-2024.0 2024.0.1-43 all Intel® oneAPI Base Toolkit
ii intel-fw-gpu 2023.39.2-255~22.04 all Firmware package for Intel integrated and discrete GPUs
ii intel-gpu-tools 1.26-2 amd64 tools for debugging the Intel graphics driver
ii intel-gsc 0.8.9+65~u22.04 amd64 Intel(R) Graphics System Controller Firmware
ii intel-igc-cm 1.0.206-775~22.04 amd64 Intel(R) C for Metal Compiler -- CM Frontend lib
ii intel-igc-core 1.0.14828.8 amd64 Intel(R) Graphics Compiler for OpenCL(TM)
ii intel-igc-opencl 1.0.14828.8 amd64 Intel(R) Graphics Compiler for OpenCL(TM)
ii intel-level-zero-gpu 1.3.27191.42-775~22.04 amd64 Intel(R) Graphics Compute Runtime for oneAPI Level Zero.
ii intel-media-va-driver-non-free:amd64 23.4.0-775~22.04 amd64 VAAPI driver for the Intel GEN8+ Graphics family
ii intel-metrics-discovery 1.12.169-775~22.04 amd64 Intel(R) Metrics Discovery Application Programming Interface --
ii intel-metrics-library 1.0.152-760~22.04 amd64 Intel(R) Metrics Library for MDAPI (Intel(R) Metrics Discovery
ii intel-microcode 3.20231114.0ubuntu0.22.04.1 amd64 Processor microcode firmware for Intel CPUs
ii intel-oneapi-advisor 2024.0.1-14 amd64 Intel® Advisor
ii intel-oneapi-ccl-2021.11 2021.11.2-5 amd64 Intel® oneAPI Collective Communications Library Runtime Environment
ii intel-oneapi-ccl-devel 2021.11.2-5 amd64 Intel® oneAPI Collective Communications Library
ii intel-oneapi-ccl-devel-2021.11 2021.11.2-5 amd64 Intel® oneAPI Collective Communications Library
ii intel-oneapi-common-licensing 2024.0.0-49406 all oneAPI Common License
ii intel-oneapi-common-licensing-2024.0 2024.0.0-49406 all oneAPI Common License
ii intel-oneapi-common-oneapi-vars 2024.0.0-49406 all oneAPI Common Toolkit Environment Script
ii intel-oneapi-common-oneapi-vars-2024.0 2024.0.0-49406 all oneAPI Common Toolkit Environment Script
ii intel-oneapi-common-vars 2024.0.0-49406 all oneAPI Common Environment Scripts
ii intel-oneapi-compiler-cpp-eclipse-cfg-2024.0 2024.0.2-49895 all Intel® oneAPI DPC++/C++ Compiler 2024.0.2 for Linux* eclipse integration configuration file (C++)
ii intel-oneapi-compiler-dpcpp-cpp 2024.0.2-49895 amd64 Intel® oneAPI DPC++/C++ Compiler
ii intel-oneapi-compiler-dpcpp-cpp-2024.0 2024.0.2-49895 amd64 Intel® oneAPI DPC++/C++ Compiler
ii intel-oneapi-compiler-dpcpp-cpp-common-2024.0 2024.0.2-49895 all Intel® oneAPI DPC++/C++ Compiler 2024.0.2 for Linux*
ii intel-oneapi-compiler-dpcpp-cpp-runtime-2024.0 2024.0.2-49895 amd64 Intel® oneAPI DPC++/C++ Compiler 2024.0.2 for Linux* runtime package for Intel(R) 64
ii intel-oneapi-compiler-dpcpp-eclipse-cfg-2024.0 2024.0.2-49895 all Intel® oneAPI DPC++/C++ Compiler 2024.0.2 for Linux* eclipse integration configuration file (DPC++)
ii intel-oneapi-compiler-shared-2024.0 2024.0.2-49895 amd64 Intel(R) Compiler Shared Files
ii intel-oneapi-compiler-shared-common-2024.0 2024.0.2-49895 all Intel(R) Compiler Shared Files
ii intel-oneapi-compiler-shared-runtime-2024.0 2024.0.2-49895 amd64 Intel(R) Compiler Shared Files runtime contents
ii intel-oneapi-dal-2024.0 2024.0.1-25 amd64 Intel® oneAPI Data Analytics Library
ii intel-oneapi-dal-common-2024.0 2024.0.1-25 all Intel® oneAPI Data Analytics Library common
ii intel-oneapi-dal-common-devel-2024.0 2024.0.1-25 all Intel® oneAPI Data Analytics Library common
ii intel-oneapi-dal-devel 2024.0.1-25 amd64 Intel® oneAPI Data Analytics Library Development Package
ii intel-oneapi-dal-devel-2024.0 2024.0.1-25 amd64 Intel® oneAPI Data Analytics Library Development Package
ii intel-oneapi-dev-utilities 2024.0.0-49320 amd64 Dev Utilities
ii intel-oneapi-dev-utilities-2024.0 2024.0.0-49320 amd64 Dev Utilities
ii intel-oneapi-dev-utilities-eclipse-cfg-2024.0 2024.0.0-49320 all intel-oneapi-dev-utilities-eclipse-cfg
ii intel-oneapi-diagnostics-utility 2024.0.0-49093 amd64 Diagnostics Utility for Intel® oneAPI Toolkits
ii intel-oneapi-diagnostics-utility-2024.0 2024.0.0-49093 amd64 Diagnostics Utility for Intel® oneAPI Toolkits
ii intel-oneapi-dnnl 2024.0.0-49521 amd64 Intel® oneAPI Deep Neural Network Library
ii intel-oneapi-dnnl-2024.0 2024.0.0-49521 amd64 Intel® oneAPI Deep Neural Network Library
ii intel-oneapi-dnnl-devel 2024.0.0-49521 amd64 Intel® oneAPI Deep Neural Network Library Development Package
ii intel-oneapi-dnnl-devel-2024.0 2024.0.0-49521 amd64 Intel® oneAPI Deep Neural Network Library Development Package
ii intel-oneapi-dpcpp-cpp-2024.0 2024.0.2-49895 amd64 Intel® oneAPI DPC++/C++ Compiler 2024.0.2 for Linux* for Intel(R) 64
ii intel-oneapi-dpcpp-ct 2024.0.0-49381 amd64 Intel® DPC++ Compatibility Tool
ii intel-oneapi-dpcpp-ct-2024.0 2024.0.0-49381 amd64 Intel® DPC++ Compatibility Tool
ii intel-oneapi-dpcpp-ct-eclipse-cfg-2024.0 2024.0.0-49381 all Intel® DPC++ Compatibility Tool 2024.0.0 for Linux* eclipse integration configuration file
ii intel-oneapi-dpcpp-debugger-2024.0 2024.0.1-6 amd64 Intel® Distribution for GDB*
ii intel-oneapi-icc-eclipse-plugin-cpp-2024.0 2024.0.2-49895 all Standards driven high performance cross architecture DPC++/C++ compiler
ii intel-oneapi-ipp-2021.10 2021.10.1-13 amd64 Intel® Integrated Performance Primitives
ii intel-oneapi-ipp-common-2021.10 2021.10.1-13 all Intel® Integrated Performance Primitives common
ii intel-oneapi-ipp-common-devel-2021.10 2021.10.1-13 all Intel® Integrated Performance Primitives common
ii intel-oneapi-ipp-devel 2021.10.1-13 amd64 Intel® Integrated Performance Primitives Development Package
ii intel-oneapi-ipp-devel-2021.10 2021.10.1-13 amd64 Intel® Integrated Performance Primitives Development Package
ii intel-oneapi-ippcp-2021.9 2021.9.1-5 amd64 Intel® Integrated Performance Primitives Cryptography
ii intel-oneapi-ippcp-common-2021.9 2021.9.1-5 all Intel® Integrated Performance Primitives Cryptography common
ii intel-oneapi-ippcp-common-devel-2021.9 2021.9.1-5 all Intel® Integrated Performance Primitives Cryptography common
ii intel-oneapi-ippcp-devel 2021.9.1-5 amd64 Intel® Integrated Performance Primitives Cryptography Development Package
ii intel-oneapi-ippcp-devel-2021.9 2021.9.1-5 amd64 Intel® Integrated Performance Primitives Cryptography Development Package
ii intel-oneapi-libdpstd-devel-2022.3 2022.3.0-49369 amd64 Intel® oneAPI DPC++ Library 2022.3.0 for Linux*
ii intel-oneapi-mkl-2024.0 2024.0.0-49656 amd64 Intel® oneAPI Math Kernel Library runtime package for Intel(R) 64
ii intel-oneapi-mkl-common-2024.0 2024.0.0-49656 all Intel® oneAPI Math Kernel Library 2024.0.0 for Linux* common
ii intel-oneapi-mkl-common-devel-2024.0 2024.0.0-49656 all Intel® oneAPI Math Kernel Library common
ii intel-oneapi-mkl-devel 2024.0.0-49656 amd64 Intel® oneAPI Math Kernel Library 2024.0.0 for Linux* development package for Intel(R) 64
ii intel-oneapi-mkl-devel-2024.0 2024.0.0-49656 amd64 Intel® oneAPI Math Kernel Library 2024.0.0 for Linux* development package for Intel(R) 64
ii intel-oneapi-mpi-2021.11 2021.11.0-49493 amd64 Intel® MPI Library Runtime Environment
ii intel-oneapi-mpi-devel-2021.11 2021.11.0-49493 amd64 Intel® MPI Library
ii intel-oneapi-openmp-2024.0 2024.0.2-49895 amd64 Intel® OpenMP* Runtime Library 2024.0.2 for Linux* for Intel(R) 64
ii intel-oneapi-openmp-common-2024.0 2024.0.2-49895 all Intel® OpenMP* Runtime Library 2024.0.2 for Linux*
ii intel-oneapi-runtime-compilers-2024 2024.0.2-49895 amd64 Intel® oneAPI DPC++/C++ Compiler runtime common files
ii intel-oneapi-runtime-compilers-common-2024 2024.0.2-49895 all Intel® oneAPI DPC++/C++ Compiler runtime common files
ii intel-oneapi-runtime-dpcpp-cpp 2024.0.2-49895 amd64 Intel® oneAPI DPC++/C++ Compiler runtime
ii intel-oneapi-runtime-dpcpp-cpp-2024 2024.0.2-49895 amd64 Intel® oneAPI DPC++/C++ Compiler runtime
ii intel-oneapi-runtime-dpcpp-cpp-common-2024 2024.0.2-49895 all Intel® oneAPI DPC++/C++ Compiler runtime
ii intel-oneapi-runtime-dpcpp-sycl-core-2024 2024.0.2-49895 all Intel® oneAPI DPC++/C++ Compiler SYCL* Runtime Core
ii intel-oneapi-runtime-dpcpp-sycl-cpu-rt-2024 2024.0.2-49895 all Intel® oneAPI DPC++/C++ Compiler SYCL* CPU
ii intel-oneapi-runtime-dpcpp-sycl-fpga-emul-2024 2024.0.2-49895 all Intel® oneAPI DPC++/C++ Compiler SYCL* FPGA Emulator Runtime
ii intel-oneapi-runtime-dpcpp-sycl-opencl-cpu-2024 2024.0.2-49895 amd64 Intel® CPU Runtime for OpenCL(TM) Applications runtime
ii intel-oneapi-runtime-dpcpp-sycl-rt-2024 2024.0.2-49895 all Intel® oneAPI DPC++/C++ Compiler SYCL* Runtime
ii intel-oneapi-runtime-mkl 2024.0.0-49656 amd64 Intel® oneAPI Math Kernel Library runtime
ii intel-oneapi-runtime-mkl-2024 2024.0.0-49656 amd64 Intel® oneAPI Math Kernel Library runtime
ii intel-oneapi-runtime-mkl-common-2024 2024.0.0-49656 all Intel® oneAPI Math Kernel Library runtime common
ii intel-oneapi-runtime-opencl-2024 2024.0.2-49895 amd64 Intel® CPU Runtime for OpenCL(TM) Applications runtime
ii intel-oneapi-runtime-openmp-2024 2024.0.2-49895 amd64 Intel® OpenMP* Runtime Library runtime
ii intel-oneapi-runtime-openmp-opencl-shared-2024 2024.0.2-49895 amd64 Intel(R) OpenMP and OpenCL shared files for runtime package
ii intel-oneapi-runtime-tbb-2021 2021.11.0-49513 amd64 Intel® oneAPI Threading Building Blocks runtime
ii intel-oneapi-runtime-tbb-common-2021 2021.11.0-49513 all Intel® oneAPI Threading Building Blocks runtime common
ii intel-oneapi-runtime-tcm-1 1.0.0-435 amd64 Thread Composability Manager
ii intel-oneapi-tbb-2021.11 2021.11.0-49513 amd64 Intel® oneAPI Threading Building Blocks
ii intel-oneapi-tbb-common-2021.11 2021.11.0-49513 all Intel® oneAPI Threading Building Blocks common
ii intel-oneapi-tbb-common-devel-2021.11 2021.11.0-49513 all Intel® oneAPI Threading Building Blocks common
ii intel-oneapi-tbb-devel 2021.11.0-49513 amd64 Intel® oneAPI Threading Building Blocks Development Package
ii intel-oneapi-tbb-devel-2021.11 2021.11.0-49513 amd64 Intel® oneAPI Threading Building Blocks Development Package
ii intel-oneapi-tcm-1.0 1.0.0-435 amd64 Thread Composability Manager
ii intel-oneapi-tlt 2024.0.0-352 amd64 Toolkit Linking Tool
ii intel-oneapi-tlt-2024.0 2024.0.0-352 amd64 Toolkit Linking Tool
ii intel-oneapi-vtune 2024.0.1-11 amd64 Intel® VTune(TM) Profiler
ii intel-opencl-icd 23.35.27191.42-775~22.04 amd64 Intel graphics compute runtime for OpenCL
ii libdrm-intel1:amd64 2.4.113-2~ubuntu0.22.04.1 amd64 Userspace interface to intel-specific kernel DRM services -- runtime
ii libdrm-intel1:i386 2.4.113-2~ubuntu0.22.04.1 i386 Userspace interface to intel-specific kernel DRM services -- runtime
ii whois 5.5.13 amd64 intelligent WHOIS client
rc xserver-xorg-video-intel 2:2.99.917+git20200226-1 amd64 X.Org X server -- Intel i8xx, i9xx display driver
and we also have
03:00.0 VGA compatible controller: Intel Corporation Device 56a1 (rev 08) (prog-if 00 [VGA controller])
Subsystem: Intel Corporation Device 1021
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin ? routed to IRQ 108
IOMMU group: 16
Region 0: Memory at fa000000 (64-bit, non-prefetchable) [size=16M]
Region 2: Memory at fa00000000 (64-bit, prefetchable) [size=8G]
Expansion ROM at fb000000 [disabled] [size=2M]
Capabilities: [40] Vendor Specific Information: Len=0c <?>
Capabilities: [70] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range B, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkCap2: Supported Link Speeds: 2.5GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+
Address: 00000000fee00000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [d0] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [420 v1] Physical Resizable BAR
BAR 2: current size: 8GB, supported: 256MB 512MB 1GB 2GB 4GB 8GB
Capabilities: [400 v1] Latency Tolerance Reporting
Max snoop latency: 1048576ns
Max no snoop latency: 1048576ns
Kernel driver in use: i915
Kernel modules: i915
@djsv23 thank you for the test. Then your driver is almost same as mine (but my system is A770 have 16G memory)_, just i915 is not there, but the second command shows your i915 works ok.
From your SD run result, it seems the iteration happened, but failed later. Have you gotten chance to other machine, or try other inference code, like the one https://github.com/intel/intel-extension-for-tensorflow/blob/main/examples/quick_example.md
import numpy as np import sys
import tensorflow as tf
N = 1 num_channel = 3 input_width, input_height = (5, 5) filter_width, filter_height = (2, 2)
x = np.random.rand(N, input_width, input_height, num_channel).astype(np.float32) weight = np.random.rand(filter_width, filter_height, num_channel, num_channel).astype(np.float32) bias = np.random.rand(num_channel).astype(np.float32)
conv = tf.nn.conv2d(x, weight, strides=[1, 1, 1, 1], padding='SAME') activation = tf.nn.relu(conv) result = tf.nn.bias_add(activation, bias)
print(result) print('Finished')
conda activate your environment source /opt/intel/oneapi/setup.vars python quick_example.py and if it can work, please attach all of the output? (or please attach all of the output of hello_world tensorflow last time)
Thanks
@yinghu5 Wild!
I ran the quick_example.py and it gave an output as expected, though it is still saying there is 0MB VRAM and the PCI bus ID is undefined:
tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:XPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: XPU, pci bus id: <undefined>)
I scaled up the parameters in the example to 9 channels and 30x input dimensions and 30x filter dimensions to create a heavier load and ran it in a loop to check and observe GPU utilization, and it seems this implementation is working.
+-----------------------------+--------------------------------------------------------------------+
| Device ID | 0 |
+-----------------------------+--------------------------------------------------------------------+
| GPU Utilization (%) | N/A |
| EU Array Active (%) | N/A |
| EU Array Stall (%) | N/A |
| EU Array Idle (%) | N/A |
| | |
| Compute Engine Util (%) | Engine 0: 81, Engine 1: 0, Engine 2: 0, Engine 3: 0 |
| Render Engine Util (%) | Engine 0: 0 |
| Media Engine Util (%) | N/A |
| Decoder Engine Util (%) | Engine 0: 0, Engine 1: 0 |
| Encoder Engine Util (%) | Engine 0: 0, Engine 1: 0 |
| Copy Engine Util (%) | Engine 0: 2 |
| Media EM Engine Util (%) | Engine 0: 0, Engine 1: 0 |
| 3D Engine Util (%) | N/A |
+-----------------------------+--------------------------------------------------------------------+
| Reset | N/A |
| Programming Errors | N/A |
| Driver Errors | N/A |
| Cache Errors Correctable | N/A |
| Cache Errors Uncorrectable | N/A |
| Mem Errors Correctable | N/A |
| Mem Errors Uncorrectable | N/A |
+-----------------------------+--------------------------------------------------------------------+
| GPU Power (W) | 167 |
| GPU Frequency (MHz) | 2250 |
| Media Engine Freq (MHz) | N/A |
| GPU Core Temperature (C) | N/A |
| GPU Memory Temperature (C) | N/A |
| GPU Memory Read (kB/s) | N/A |
| GPU Memory Write (kB/s) | N/A |
| GPU Memory Bandwidth (%) | N/A |
| GPU Memory Used (MiB) | 4795 |
| GPU Memory Util (%) | 59 |
| Xe Link Throughput (kB/s) | N/A |
+-----------------------------+--------------------------------------------------------------------+
It seems there is still some issue with my i915 driver that is not allowing all GPU information to be accessible, and some things are able to handle it more gracefully than others
@djsv23 Good that you were able to get it working. The issue of 0Mb VRAM you are seeing is because when Tensorflow is able to detect a device but not identify the mem it is defaulting to 0. It is not an issue. And going by your observation of running a heavier load, the issue with not being able to run Stable diffusion is only the limitation from mem bottleneck of 8GB.
Thanks all for the help - it seems that 8GB VRAM is quite limiting in AI image generation and might require some offloading strategies. I was able to get a 128x128 image to generate, which unfortunately isn't enough to create a meaningful image from the prompt and from xpu-smi appears to have required nearly 7GB to process.
On Device: Intel ARC A750, operating system Ubuntu 22.04. Similar to #59, I've followed the installation procedure, and I've followed the instructions to ensure onemlk is activated. running env_check.sh still gives the error about not finding cuda drivers:
` Check Environment for Intel(R) Extension for TensorFlow*...
======================== Check Python ========================
python3.9 is installed.
==================== Check Python Passed =====================
========================== Check OS ==========================
OS ubuntu:22.04 is Supported.
====================== Check OS Passed =======================
====================== Check Tensorflow ======================
Tensorflow2.14 is installed.
================== Check Tensorflow Passed ===================
=================== Check Intel GPU Driver ===================
Intel(R) graphics runtime intel-level-zero-gpu-1.3.27191.42-775 is installed, but is not recommended . Intel(R) graphics runtime intel-opencl-icd-23.35.27191.42-775 is installed, but is not recommended . Intel(R) graphics runtime level-zero-1.14.0-744 is installed, but is not recommended . Intel(R) graphics runtime libigc1-1.0.15136.24-775 is installed, but is not recommended . Intel(R) graphics runtime libigdfcl1-1.0.15136.24-775 is installed, but is not recommended . Intel(R) graphics runtime libigdgmm12-22.3.12-742 is installed, but is not recommended .
=============== Check Intel GPU Driver Finshed ================
===================== Check Intel oneAPI =====================
Intel(R) oneAPI DPC++/C++ Compiler is installed. Intel(R) oneAPI Math Kernel Library is installed.
================= Check Intel oneAPI Passed ==================
========================== Check Devices Availability ==========================
2024-02-01 10:55:06.186663: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-02-01 10:55:06.188042: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used. 2024-02-01 10:55:06.206690: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-02-01 10:55:06.206709: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-02-01 10:55:06.206733: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-02-01 10:55:06.211135: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used. 2024-02-01 10:55:06.211256: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-02-01 10:55:06.688400: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2024-02-01 10:55:06.965341: I itex/core/wrapper/itex_cpu_wrapper.cc:60] Intel Extension for Tensorflow AVX512 CPU backend is loaded. 2024-02-01 10:55:08.016040: I itex/core/wrapper/itex_gpu_wrapper.cc:35] Intel Extension for Tensorflow GPU backend is loaded. 2024-02-01 10:55:08.090198: I itex/core/devices/gpu/itex_gpu_runtime.cc:129] Selected platform: Intel(R) Level-Zero 2024-02-01 10:55:08.090492: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device. 2024-02-01 10:55:08.563296: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
====================== Check Devices Availability Passed ======================= `