djsv23 commented 6 months ago

On Device: Intel ARC A750, operating system Ubuntu 22.04. Similar to #59, I've followed the installation procedure, and I've followed the instructions to ensure onemlk is activated. running env_check.sh still gives the error about not finding cuda drivers:

` Check Environment for Intel(R) Extension for TensorFlow*...

======================== Check Python ========================

python3.9 is installed.

==================== Check Python Passed =====================

========================== Check OS ==========================

OS ubuntu:22.04 is Supported.

====================== Check OS Passed =======================

====================== Check Tensorflow ======================

Tensorflow2.14 is installed.

================== Check Tensorflow Passed ===================

=================== Check Intel GPU Driver ===================

Intel(R) graphics runtime intel-level-zero-gpu-1.3.27191.42-775 is installed, but is not recommended . Intel(R) graphics runtime intel-opencl-icd-23.35.27191.42-775 is installed, but is not recommended . Intel(R) graphics runtime level-zero-1.14.0-744 is installed, but is not recommended . Intel(R) graphics runtime libigc1-1.0.15136.24-775 is installed, but is not recommended . Intel(R) graphics runtime libigdfcl1-1.0.15136.24-775 is installed, but is not recommended . Intel(R) graphics runtime libigdgmm12-22.3.12-742 is installed, but is not recommended .

=============== Check Intel GPU Driver Finshed ================

===================== Check Intel oneAPI =====================

Intel(R) oneAPI DPC++/C++ Compiler is installed. Intel(R) oneAPI Math Kernel Library is installed.

================= Check Intel oneAPI Passed ==================

========================== Check Devices Availability ==========================

2024-02-01 10:55:06.186663: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-02-01 10:55:06.188042: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used. 2024-02-01 10:55:06.206690: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-02-01 10:55:06.206709: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-02-01 10:55:06.206733: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-02-01 10:55:06.211135: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used. 2024-02-01 10:55:06.211256: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-02-01 10:55:06.688400: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2024-02-01 10:55:06.965341: I itex/core/wrapper/itex_cpu_wrapper.cc:60] Intel Extension for Tensorflow AVX512 CPU backend is loaded. 2024-02-01 10:55:08.016040: I itex/core/wrapper/itex_gpu_wrapper.cc:35] Intel Extension for Tensorflow GPU backend is loaded. 2024-02-01 10:55:08.090198: I itex/core/devices/gpu/itex_gpu_runtime.cc:129] Selected platform: Intel(R) Level-Zero 2024-02-01 10:55:08.090492: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device. 2024-02-01 10:55:08.563296: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected

====================== Check Devices Availability Passed ======================= `

srinarayan-srikanthan commented 6 months ago

Can you please try the driver version suggested here : https://github.com/intel/intel-extension-for-tensorflow/blob/main/docs/install/install_for_xpu.md#install-gpu-drivers

djsv23 commented 6 months ago

Hi @srinarayan-srikanthan, this error is produced when I have used the driver suggested on that page by following the installation instructions linked to https://github.com/intel/intel-extension-for-tensorflow/blob/main/docs/install/experimental/install_for_arc_gpu.md. Is there another procedure I should use to install the recommended driver version?

srinarayan-srikanthan commented 6 months ago

Hi @djsv23 , yes you are looking into the rite page, but the version you have installed is stable_775_20_20231219 instead of stable_736_25_2023103. The instruction on the page specifies the versions as below

But the output of your env_check script is not showing the rite versions. Can you please check on that.

djsv23 commented 6 months ago

Would you mind adding the link to the instructions page where this version list is found? I'm having a hard time navigating and understanding which are the current instructions as https://dgpu-docs.intel.com/driver/client/overview.html does not have them, and is the only driver installation instruction page I can find that does not have a deprecation notice.

I see also this pinned version list in the Ubuntu 22.04 for WSL instructions in this repository, but the same instructions are not given for Ubuntu 22.04 on bare metal, which is how my environment is configured.

That said, I have installed the mentioned versions and still see an output from env_check.sh which suggests that required drivers are missing.

` Check Environment for Intel(R) Extension for TensorFlow*...

======================== Check Python ========================

python3.10 is installed.

==================== Check Python Passed =====================

========================== Check OS ==========================

OS ubuntu:22.04 is Supported.

====================== Check OS Passed =======================

====================== Check Tensorflow ======================

Tensorflow2.14 is installed.

================== Check Tensorflow Passed ===================

=================== Check Intel GPU Driver ===================

Intel(R) graphics runtime intel-level-zero-gpu-1.3.26918.50-736 is installed, but is not recommended . Intel(R) graphics runtime intel-opencl-icd-23.30.26918.50-736 is installed, but is not recommended . Intel(R) graphics runtime level-zero-1.13.1-719 is installed, but is not recommended . Intel(R) graphics runtime libigc1-1.0.14828.26-736 is installed, but is not recommended . Intel(R) graphics runtime libigdfcl1-1.0.14828.26-736 is installed, but is not recommended . Intel(R) graphics runtime libigdgmm12-22.3.10-712 is installed, but is not recommended .

=============== Check Intel GPU Driver Finshed ================

===================== Check Intel oneAPI =====================

Intel(R) oneAPI DPC++/C++ Compiler is installed. Intel(R) oneAPI Math Kernel Library is installed.

================= Check Intel oneAPI Passed ==================

========================== Check Devices Availability ==========================

2024-02-07 12:30:08.568077: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-02-07 12:30:08.569821: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used. 2024-02-07 12:30:08.590078: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-02-07 12:30:08.590106: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-02-07 12:30:08.590132: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-02-07 12:30:08.595032: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used. 2024-02-07 12:30:08.595173: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-02-07 12:30:08.996531: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2024-02-07 12:30:09.309113: I itex/core/wrapper/itex_cpu_wrapper.cc:60] Intel Extension for Tensorflow AVX512 CPU backend is loaded. 2024-02-07 12:30:09.681137: I itex/core/wrapper/itex_gpu_wrapper.cc:35] Intel Extension for Tensorflow GPU backend is loaded. 2024-02-07 12:30:09.745557: I itex/core/devices/gpu/itex_gpu_runtime.cc:129] Selected platform: Intel(R) Level-Zero 2024-02-07 12:30:09.745840: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device. 2024-02-07 12:30:10.068782: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected

====================== Check Devices Availability Passed ======================= `

srinarayan-srikanthan commented 6 months ago

So from your env_check.sh I see "itex/core/wrapper/itex_gpu_wrapper.cc:35] Intel Extension for Tensorflow* GPU backend is loaded.", so it does load the backend, what is the error you are facing when you try to run the following command python -c "import intel_extension_for_tensorflow as itex; print(itex.version)" ?

With regard to the other question, you can find instructions here : https://github.com/intel/intel-extension-for-tensorflow/blob/main/docs/install/experimental/install_for_arc_gpu.md#native-linux-running-directly-on-hardware-1

djsv23 commented 6 months ago

The error I am getting is that despite the GPU backend being loaded, it says that GPU will not be used and cannot find a CUDA-capable device.

"2024-02-08 09:57:04.630925: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used."

2024-02-08 09:57:14.392825: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected

srinarayan-srikanthan commented 6 months ago

Okay can you post the complete output please of the import command.

djsv23 commented 6 months ago

python -c "import intel_extension_for_tensorflow as itex; print(itex.version)" 2024-02-08 09:57:04.527315: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-02-08 09:57:04.630925: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used. 2024-02-08 09:57:05.096549: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-02-08 09:57:05.096579: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-02-08 09:57:05.098091: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-02-08 09:57:05.324901: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used. 2024-02-08 09:57:05.328626: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-02-08 09:57:06.660042: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2024-02-08 09:57:07.789687: I itex/core/wrapper/itex_cpu_wrapper.cc:60] Intel Extension for Tensorflow AVX512 CPU backend is loaded. 2024-02-08 09:57:12.504640: I itex/core/wrapper/itex_gpu_wrapper.cc:35] Intel Extension for Tensorflow GPU backend is loaded. 2024-02-08 09:57:12.947637: I itex/core/devices/gpu/itex_gpu_runtime.cc:129] Selected platform: Intel(R) Level-Zero 2024-02-08 09:57:12.947946: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device. 2024-02-08 09:57:14.392825: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected Traceback (most recent call last): File "", line 1, in File "/home/zebra/miniconda3/envs/itex/lib/python3.10/site-packages/intel_extension_for_tensorflow/init.py", line 19, in from intel_extension_for_tensorflow.python.config import set_config # pylint: disable=unused-import File "/home/zebra/miniconda3/envs/itex/lib/python3.10/site-packages/intel_extension_for_tensorflow/python/config.py", line 21, in from intel_extension_for_tensorflow.python._pywrap_itex import * ModuleNotFoundError: No module named 'intel_extension_for_tensorflow.python._pywrap_itex'

srinarayan-srikanthan commented 6 months ago

Okay, i see GPU backed being loaded and then failing, can you paste output of conda list | grep tensorflow ?

djsv23 commented 6 months ago

(itex) user@host:~$ conda list | grep tensorflow intel-extension-for-tensorflow 2.14.0.2 pypi_0 pypi tensorflow-datasets 4.9.4 pypi_0 pypi tensorflow-metadata 1.14.0 pypi_0 pypi

srinarayan-srikanthan commented 6 months ago

are these all the packages, did you install tensorflow before installing intel-extension-for-tensorflow ?

djsv23 commented 6 months ago

This is the entire output - at one point I think I had removed and reinstalled intel-extension-for-tensorflow in this conda environment. Should I try setting up a new one from scratch?

srinarayan-srikanthan commented 6 months ago

Yes please try creating the environment from scratch following the instructions starting from here : https://github.com/intel/intel-extension-for-tensorflow/blob/main/docs/install/experimental/install_for_arc_gpu.md#2-install-tensorflow-via-pypi-wheel-in-linux

djsv23 commented 6 months ago

Ok. I've torn everything down, removed miniconda, spun up a new python 3.10 virtual environment and get this output now. It no longer fails to import the module, but still is not using the GPU.

pip list | grep tensorflow intel-extension-for-tensorflow 2.14.0.2 intel-extension-for-tensorflow-lib 2.14.0.2.2 tensorflow 2.14.0 tensorflow-estimator 2.14.0 tensorflow-io-gcs-filesystem 0.36.0

python -c "import intel_extension_for_tensorflow as itex; print(itex.version)" 2024-02-08 13:03:04.514441: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-02-08 13:03:04.516194: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used. 2024-02-08 13:03:04.546696: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-02-08 13:03:04.546724: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-02-08 13:03:04.546756: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-02-08 13:03:04.552904: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used. 2024-02-08 13:03:04.553073: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-02-08 13:03:05.181948: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2024-02-08 13:03:05.422317: I itex/core/wrapper/itex_cpu_wrapper.cc:60] Intel Extension for Tensorflow AVX512 CPU backend is loaded. 2024-02-08 13:03:06.946692: I itex/core/wrapper/itex_gpu_wrapper.cc:35] Intel Extension for Tensorflow GPU backend is loaded. 2024-02-08 13:03:07.024321: I itex/core/devices/gpu/itex_gpu_runtime.cc:129] Selected platform: Intel(R) Level-Zero 2024-02-08 13:03:07.024636: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device. 2024-02-08 13:03:07.656850: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected 2.14.0.2

srinarayan-srikanthan commented 6 months ago

I see the GPU backend being loaded, can you try itex.get_backend() after importing it. Or list physical devices and check. It should now be using the GPU.

djsv23 commented 6 months ago

I checked the physical device and it is present (I am using the GPU for desktop video output, so no surprise there)

xpu-smi discovery +-----------+--------------------------------------------------------------------------------------+ | Device ID | Device Information | +-----------+--------------------------------------------------------------------------------------+ | 0 | Device Name: Intel(R) Arc(TM) A750 Graphics | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-0003-0000-000856a18086 | | | PCI BDF Address: 0000:03:00.0 | | | DRM Device: /dev/dri/card0 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+

So then I try generating an image with keras_cv as per the intel tutorial at https://medium.com/intel-analytics-software/running-tensorflow-stable-diffusion-on-intel-arc-gpus-e6ff0d2b7549 and we get this output:

I also get this in the logs on the Jupyter server:

2024-02-09 11:23:11.419551: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-02-09 11:23:11.420929: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used. 2024-02-09 11:23:11.440019: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-02-09 11:23:11.440038: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-02-09 11:23:11.440066: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-02-09 11:23:11.444734: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used. 2024-02-09 11:23:11.445029: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-02-09 11:23:11.838733: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2024-02-09 11:23:12.060294: I itex/core/wrapper/itex_cpu_wrapper.cc:60] Intel Extension for Tensorflow AVX512 CPU backend is loaded. 2024-02-09 11:23:12.752293: I itex/core/wrapper/itex_gpu_wrapper.cc:35] Intel Extension for Tensorflow GPU backend is loaded. 2024-02-09 11:23:12.828676: I itex/core/devices/gpu/itex_gpu_runtime.cc:129] Selected platform: Intel(R) Level-Zero 2024-02-09 11:23:12.828995: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device. 2024-02-09 11:23:13.180820: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected 2024-02-09 11:24:39.753366: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform XPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2024-02-09 11:24:39.753389: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:XPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: XPU, pci bus id: ) 2024-02-09 11:24:42.064980: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:117] Plugin optimizer for device_type XPU is enabled. 2024-02-09 11:25:00.907403: E itex/core/devices/bfc_allocator.cc:112] Allocator ran out of memory trying to allocate 536870912 Bytes (rounded to 536870912 Bytes)

End result is that it seems to be not finding the device and trying to proceed anyway. It finds that 0 VRAM is insufficient and fails to generate an image.

srinarayan-srikanthan commented 6 months ago

It is able to detect the device and loading it because I see XPU being enabled "2024-02-09 11:24:42.064980: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:117] Plugin optimizer for device_type XPU is enabled. ". Can you monitor the GPU and check its use/mem usage and output of xpu-smi

djsv23 commented 6 months ago

@srinarayan-srikanthan I have done this; running the computer headless and accessing remotely, intel_gpu_top shows that the GPU is on standby at 0% usage and 0 mhz clock speed. I'm further convinced the GPU is not used, because no image is generated. Images[] is empty so when we run plt.imshow(images[0]) there is no output.

srinarayan-srikanthan commented 6 months ago

What is the output of xpu-smi ?

djsv23 commented 6 months ago

It is the same as before. Is there another subcommand that would be helpful to see?

xpu-smi discovery +-----------+--------------------------------------------------------------------------------------+ | Device ID | Device Information | +-----------+--------------------------------------------------------------------------------------+ | 0 | Device Name: Intel(R) Arc(TM) A750 Graphics | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-0003-0000-000856a18086 | | | PCI BDF Address: 0000:03:00.0 | | | DRM Device: /dev/dri/card0 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+

srinarayan-srikanthan commented 6 months ago

Here are two suggestions, can you try reducing the size of the image and running it? The tutorial you shared was for Arc 770 which comes with 16GB whereas 750 is equipped with 8 GB. Also can you try the below version of the model : https://github.com/intel/intel-extension-for-tensorflow/tree/main/examples/stable_diffussion_inference

yinghu5 commented 6 months ago

@djsv23 , We run into another similar issue recently: https://github.com/intel/intel-extension-for-transformers/issues/1276. Not sure if it helps. Just for your reference,

could you please try below in the terminal environment first? (after it works, then try jupyter-notebook) 1) conda activate your env 2) source /opt/intel/oneapi/setvars.sh 3) groups 4) sycl-ls ( the devices show ?) 5) env_check.sh ( maybe same error) 6) check the libstd++ libraries, and delete (rename) the libstd* under your conda environment.
for example, mine itex214 env:

(itex214) yhu5@arc770-tce:~$ ls /home/yhu5/miniconda3/envs/itex214/lib/libstd /home/yhu5/miniconda3/envs/itex214/lib/libstdc++.so /home/yhu5/miniconda3/envs/itex214/lib/libstdc++.so.6.0.29 /home/yhu5/miniconda3/envs/itex214/lib/libstdc++.so.6 (itex214) yhu5@arc770-tce:~$ ls /usr/lib/x86_64-linux-gnu/libstd /usr/lib/x86_64-linux-gnu/libstdc++.so.6 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30

7) env_check.sh again.

and let us know the screen output.

thanks

djsv23 commented 6 months ago

Also can you try the below version of the model : https://github.com/intel/intel-extension-for-tensorflow/tree/main/examples/stable_diffussion_inference

There are a number of issues I'm seeing with this set of instructions:

When attempting to follow the instructions, the "patch" file is not present in the specified branch of the keras-cv repository.
It says to cd into the keras-cv repository, set up intel-extension-for-tensorflow and then run ./pip_set_env.sh but leaves out that to do this, one must clone the intel-extension-for-tensorflow repository and cd into the stable_diffusion_inference directory of that repository.
Going back through this I see that the patch comes from intel-extension-for-tensorflow and needs to be moved inside of the cloned keras-cv repository to apply

@yinghu5 Here is the output from the terminal before attempting the Jupyter notebook:


:: initializing oneAPI environment ...
   -bash: BASH_VERSION = 5.1.16(1)-release
   args: Using "$@" for setvars.sh arguments: --force
:: advisor -- latest
:: ccl -- latest
:: compiler -- latest
:: dal -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: dnnl -- latest
:: dpcpp-ct -- latest
:: dpl -- latest
:: ipp -- latest
:: ippcp -- latest
:: mkl -- latest
:: mpi -- latest
:: tbb -- latest
:: vtune -- latest
:: oneAPI environment initialized ::

(itex2) user@host:~/intel-extension-for-tensorflow/examples/stable_diffussion_inference$ groups
zebra adm tty cdrom sudo dip video plugdev kvm ssl-cert lpadmin sambashare render libvirt boinc
(itex2) user@host:~/intel-extension-for-tensorflow/examples/stable_diffussion_inference$ sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2023.16.10.0.17_160000]
[opencl:cpu:1] Intel(R) OpenCL, AMD Ryzen 7 7700X 8-Core Processor              OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000]
[opencl:acc:2] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:gpu:3] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A750 Graphics OpenCL 3.0 NEO  [23.35.27191.42]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A750 Graphics 1.3 [1.3.27191]
(itex2) user@host:~/itex2/lib/python3.10/site-packages/intel_extension_for_tensorflow/tools$ ./env_check.sh

    Check Environment for Intel(R) Extension for TensorFlow*...

========================  Check Python  ========================

 python3.10 is installed.

====================  Check Python Passed  =====================

==========================  Check OS  ==========================

 OS ubuntu:22.04 is Supported.

======================  Check OS Passed  =======================

======================  Check Tensorflow  ======================

 Tensorflow2.14 is installed.

==================  Check Tensorflow Passed  ===================

===================  Check Intel GPU Driver  ===================

 Intel(R) graphics runtime intel-level-zero-gpu-1.3.27191.42-775 is installed, but is not recommended .
 Intel(R) graphics runtime intel-opencl-icd-23.35.27191.42-775 is installed, but is not recommended .
 Intel(R) graphics runtime level-zero-1.14.0-744 is installed, but is not recommended .
 Intel(R) graphics runtime libigc1-1.0.15136.24-775 is installed, but is not recommended .
 Intel(R) graphics runtime libigdfcl1-1.0.15136.24-775 is installed, but is not recommended .
 Intel(R) graphics runtime libigdgmm12-22.3.12-742 is installed, but is not recommended .

===============  Check Intel GPU Driver Finshed  ================

=====================  Check Intel oneAPI  =====================

 Intel(R) oneAPI DPC++/C++ Compiler is installed.
 Intel(R) oneAPI Math Kernel Library is installed.

=================  Check Intel oneAPI Passed  ==================

==========================  Check Devices Availability  ==========================

2024-02-20 09:11:28.723803: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-02-20 09:11:28.725558: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-20 09:11:28.747033: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-20 09:11:28.747062: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-20 09:11:28.747089: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-20 09:11:28.751889: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-20 09:11:28.752044: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-02-20 09:11:29.283384: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-02-20 09:11:29.592753: I itex/core/wrapper/itex_cpu_wrapper.cc:60] Intel Extension for Tensorflow* AVX512 CPU backend is loaded.
2024-02-20 09:11:30.524508: I itex/core/wrapper/itex_gpu_wrapper.cc:35] Intel Extension for Tensorflow* GPU backend is loaded.
2024-02-20 09:11:30.598371: I itex/core/devices/gpu/itex_gpu_runtime.cc:129] Selected platform: Intel(R) Level-Zero
2024-02-20 09:11:30.598671: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
2024-02-20 09:11:30.924933: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected

======================  Check Devices Availability Passed  =======================

(itex2) user@host:/usr/lib/x86_64-linux-gnu$ ls old.libstd*
old.libstdc++.so.6  old.libstdc++.so.6.0.30  old.libstd-f73d1def252dd6f1.so
(itex2) user@host:/usr/lib/x86_64-linux-gnu$ ls libstd*
ls: cannot access 'libstd*': No such file or directory

(itex2) user@host:~/itex2/lib/python3.10/site-packages/intel_extension_for_tensorflow/tools$ ./env_check.sh

    Check Environment for Intel(R) Extension for TensorFlow*...

========================  Check Python  ========================

 python3.10 is installed.

====================  Check Python Passed  =====================

==========================  Check OS  ==========================

 OS ubuntu:22.04 is Supported.

======================  Check OS Passed  =======================

======================  Check Tensorflow  ======================

 Tensorflow2.14 is installed.

==================  Check Tensorflow Passed  ===================

===================  Check Intel GPU Driver  ===================

 Intel(R) graphics runtime intel-level-zero-gpu-1.3.27191.42-775 is installed, but is not recommended .
 Intel(R) graphics runtime intel-opencl-icd-23.35.27191.42-775 is installed, but is not recommended .
 Intel(R) graphics runtime level-zero-1.14.0-744 is installed, but is not recommended .
 Intel(R) graphics runtime libigc1-1.0.15136.24-775 is installed, but is not recommended .
 Intel(R) graphics runtime libigdfcl1-1.0.15136.24-775 is installed, but is not recommended .
 Intel(R) graphics runtime libigdgmm12-22.3.12-742 is installed, but is not recommended .

===============  Check Intel GPU Driver Finshed  ================

=====================  Check Intel oneAPI  =====================

 Intel(R) oneAPI DPC++/C++ Compiler is installed.
 Intel(R) oneAPI Math Kernel Library is installed.

=================  Check Intel oneAPI Passed  ==================

==========================  Check Devices Availability  ==========================

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/zebra/itex2/lib/python3.10/site-packages/tensorflow/__init__.py", line 38, in <module>
    from tensorflow.python.tools import module_util as _module_util
  File "/home/zebra/itex2/lib/python3.10/site-packages/tensorflow/python/__init__.py", line 36, in <module>
    from tensorflow.python import pywrap_tensorflow as _pywrap_tensorflow
  File "/home/zebra/itex2/lib/python3.10/site-packages/tensorflow/python/pywrap_tensorflow.py", line 26, in <module>
    self_check.preload_check()
  File "/home/zebra/itex2/lib/python3.10/site-packages/tensorflow/python/platform/self_check.py", line 63, in preload_check
    from tensorflow.python.platform import _pywrap_cpu_feature_guard
ImportError: libstdc++.so.6: cannot open shared object file: No such file or directory
 You have multiple libstdc++.so.6, make sure you are using the correct one.
     /usr/lib/i386-linux-gnu/libstdc++.so.6.0.30.

 Enable OCL_ICD_ENABLE_TRACE=1 OCL_ICD_DEBUG=2 to obtain detail information when using Intel® Extension for TensorFlow*.

======================  Check Devices Availability Failed  =======================

@srinarayan-srikanthan Running the instructions in the sample file gives again the same error of no CUDA-capable device detected and fails to generate any images:

srinarayan-srikanthan commented 6 months ago

The error you are referring to in this "2024-02-09 11:23:13.180820: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected" is normal behavior. TensorFlow package has cuda support by default, so they try to launch cuda.

The reason for the image not being created could be the memory issue. Can you try running any other workload and see if memory is the issue.

Thank you for the suggestion, will update the Readme with instructions for patch file.

yinghu5 commented 6 months ago

@djsv23 thank you a lot for checking. Then the libstdc++ library in your environment is correct. Please change back, the problem is not related to it.

As Sri mentioned how about other workload like the hello-world.py : just download the py form https://github.com/oneapi-src/oneAPI-samples/tree/master/AI-and-Analytics/Getting-Started-Samples/IntelTensorFlow_GettingStarted

and run $python TensorFlow_HelloWorld.py (no code change need for tensorflow run on CPU and intel GPU)

About the example of SD, could you please also show the conda list and pip list ? and check the Keras version, if it is > 3.0, please change to old version like 2.14 and try again.
tensorflow 2.14.1 requires keras<2.15,>=2.14.0

thank you!

djsv23 commented 6 months ago

The error you are referring to in this "2024-02-09 11:23:13.180820: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected" is normal behavior. TensorFlow package has cuda support by default, so they try to launch cuda.

The reason for the image not being created could be the memory issue. Can you try running any other workload and see if memory is the issue.

Thank you for the suggestion, will update the Readme with instructions for patch file.

I tried with a smaller image size, 256x256 and it still get the same issue. When I run the text to image command it is giving an error that says 0MB memory is available and can't identify the PCI bus where the ARC gpu is located. After the error is raised then the python kernel dies and automatically resets. It seems to me that this is presenting as insufficient memory because the program cannot access the card, it's VRAM, or both.

djsv23 commented 6 months ago

``> @djsv23 thank you a lot for checking. Then the libstdc++ library in your environment is correct. Please change back, the problem is not related to it.

As Sri mentioned how about other workload like the hello-world.py : just download the py form https://github.com/oneapi-src/oneAPI-samples/tree/master/AI-and-Analytics/Getting-Started-Samples/IntelTensorFlow_GettingStarted

and run $python TensorFlow_HelloWorld.py (no code change need for tensorflow run on CPU and intel GPU)

About the example of SD, could you please also show the conda list and pip list ? and check the Keras version, if it is > 3.0, please change to old version like 2.14 and try again. tensorflow 2.14.1 requires keras<2.15,>=2.14.0

thank you!

I ran the HelloWorld script which blew up with an error. The repository suggested to run diagnostics.py from the OneAPI toolkit so I installed that and ran it to get this report:

Default checks will be run. For information on how to run other checks, see 'python3 diagnostics.py --help'

===============

Checks results:

================================================================================================================================
Check name: user_group_check
Description: This check verifies that the current user is in the same group as the GPU(s).
Result status: PASS
================================================================================================================================

================================================================================================================================
Check name: driver_compatibility_check
Description: This check verifies compatibility of oneAPI products versions and GPU drivers versions.
Result status: FAIL
Installed version of OpenCL™ may not be compatible with the version of the Intel® oneAPI Deep Neural Network Library. Recommended  version of OpenCL™ is 23.26.26690
Installed version of OpenCL™ may not be compatible with the version of the Intel® DPC++ Compatibility Tool. Recommended  version of OpenCL™ is 23.26.26690
Installed version of OpenCL™ may not be compatible with the version of the Intel® oneAPI DPC++ Library. Recommended  version of OpenCL™ is 23.26.26690
Installed version of OpenCL™ may not be compatible with the version of the Intel® oneAPI Math Kernel Library. Recommended  version of OpenCL™ is 23.26.26690
Installed version of OpenCL™ may not be compatible with the version of the Intel® MPI Library. Recommended  version of OpenCL™ is 23.26.26690
Installed version of OpenCL™ may not be compatible with the version of the Intel® oneAPI Threading Building Blocks. Recommended  version of OpenCL™ is 23.26.26690
================================================================================================================================

================================================================================================================================
Check name: oneapi_toolkit_check
Description: This check shows information about installed oneAPI toolkits.
Result status: PASS
================================================================================================================================

================================================================================================================================
Check name: gpu_backend_check
Description: This check shows information from OpenCL™ and Intel® oneAPI Level Zero drivers.
Result status: ERROR
Intel® oneAPI Level Zero driver is not initialized.
================================================================================================================================

================================================================================================================================
Check name: intel_gpu_detector_check
Description: This check shows which Intel GPU(s) is on the system, based on lspci information and internal table.
Result status: ERROR
Unable to get information about initialized devices because the user does not have read access to /sys/kernel/debug/dri/.
================================================================================================================================

================================================================================================================================
Check name: oneapi_env_check
Description: This check shows if the oneAPI environment is configured and provides a list of oneAPI components with their versions if they are present in the environment
Result status: PASS
================================================================================================================================

================================================================================================================================
Check name: compiler_check
Description: This check shows information about the GCC compiler.
Result status: PASS
================================================================================================================================

7 CHECKS: 4 PASS, 1 FAIL, 0 WARNINGS, 2 ERRORS

Seeing the one read access error, I ran again as root:

/opt/intel/oneapi/diagnostics/2024.0/opt/diagnostics$ sudo python3 diagnostics.py
Default checks will be run. For information on how to run other checks, see 'python3 diagnostics.py --help'

===============

Checks results:

================================================================================================================================
Check name: user_group_check
Description: This check verifies that the current user is in the same group as the GPU(s).
Result status: PASS
Root user does not need to be in groups to have access to devices. The root user always has access to devices.
================================================================================================================================

================================================================================================================================
Check name: driver_compatibility_check
Description: This check verifies compatibility of oneAPI products versions and GPU drivers versions.
Result status: FAIL
Installed version of OpenCL™ may not be compatible with the version of the Intel® oneAPI Deep Neural Network Library. Recommended  version of OpenCL™ is 23.26.26690
Installed version of OpenCL™ may not be compatible with the version of the Intel® DPC++ Compatibility Tool. Recommended  version of OpenCL™ is 23.26.26690
Installed version of OpenCL™ may not be compatible with the version of the Intel® oneAPI DPC++ Library. Recommended  version of OpenCL™ is 23.26.26690
Installed version of OpenCL™ may not be compatible with the version of the Intel® oneAPI Math Kernel Library. Recommended  version of OpenCL™ is 23.26.26690
Installed version of OpenCL™ may not be compatible with the version of the Intel® MPI Library. Recommended  version of OpenCL™ is 23.26.26690
Installed version of OpenCL™ may not be compatible with the version of the Intel® oneAPI Threading Building Blocks. Recommended  version of OpenCL™ is 23.26.26690
================================================================================================================================

================================================================================================================================
Check name: oneapi_toolkit_check
Description: This check shows information about installed oneAPI toolkits.
Result status: PASS
================================================================================================================================

================================================================================================================================
Check name: gpu_backend_check
Description: This check shows information from OpenCL™ and Intel® oneAPI Level Zero drivers.
Result status: WARNING
Unknown internal error: 0x.78000003
================================================================================================================================

================================================================================================================================
Check name: intel_gpu_detector_check
Description: This check shows which Intel GPU(s) is on the system, based on lspci information and internal table.
Result status: ERROR
[Errno 1] Operation not permitted: '/sys/kernel/debug/dri/0/i915_gpu_info'
================================================================================================================================

================================================================================================================================
Check name: oneapi_env_check
Description: This check shows if the oneAPI environment is configured and provides a list of oneAPI components with their versions if they are present in the environment
Result status: FAIL
oneAPI environment not configured.
================================================================================================================================

================================================================================================================================
Check name: compiler_check
Description: This check shows information about the GCC compiler.
Result status: PASS
================================================================================================================================

7 CHECKS: 3 PASS, 2 FAIL, 1 WARNING, 1 ERROR

@srinarayan-srikanthan @yinghu5 This may be of interest to both, but I did notice the mention if i915_gpu_info. When I go back through the driver installation instructions, it says there is an option to install the out-of-tree driver modules, including intel_i915_dkms among others. I was able to install the others, but the i915_dkms package appears to be incompatible with the 6.5 kernel I am running. The rest of the documentation suggests that 6.x kernels do not require the out of tree drivers; is it possible that there is something in the out-of-tree package that is required for the intel tensorflow extension that has not been upstreamed?

yinghu5 commented 6 months ago

@djsv23 thank you a lot! then we are back to the driver again :)

Before you go ahead reinstall driver, system etc, let's check if your A750 work or not: $ lspci
find your A750's line: with " 56A1" for example, mine is 03:00.0 VGA compatible controller: Intel Corporation Device 56a0 (rev 08)

then $ lspci -s xxx slot -vvv for example, (itex214) yhu5@arc770-tce:~$ lspci -s 03:00.0 -vvv 03:00.0 VGA compatible controller: Intel Corporation Device 56a0 (rev 08) (prog-if 00 [VGA controller]) Subsystem: Device 1ef7:1307 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin ? routed to IRQ 182 Region 0: Memory at 41000000 (64-bit, non-prefetchable) [size=16M] Region 2: Memory at 6000000000 (64-bit, prefetchable) [size=16G] Expansion ROM at 42000000 [disabled] [size=2M] Capabilities: Kernel driver in use: i915 Kernel modules: i915

on the other hand, System BIOS configuration can have a significant impact on your GPU:

https://www.intel.com/content/www/us/en/support/articles/000091128/graphics.html

· Above 4G Decoding -> Enabled · Re-Size BAR Support -> Enabled

I'm not sure if they are default on your system, but here is reference. https://www.digitaltrends.com/computing/how-to-use-rebar-on-arc-gpus/#dt-heading-how-to-enable-rebar-on-intel-arc-gpus

djsv23 commented 6 months ago

Thanks. I am sure the GPU is working as I have been gaming with it since day 1. Resizeable BAR and above 4G decoding are enabled. I also had to disable the iGPU on my 7700x cpu in BIOS to stop Proton for Steam from using it instead of the ARC card so all display output is definitely using the Intel GPU

djsv23 commented 6 months ago

Also I have been doing video encoding with the card and see that FFMPEG is able to access the video engine and VRAM

yinghu5 commented 6 months ago

@djsv23 nice to know!

how was the output in this machine?

dpkg -l | grep intel

and lspci -s 03:00.0 -vvv?

djsv23 commented 6 months ago

ii  intel-basekit                                   2024.0.1-43                                                         amd64        Intel® oneAPI Base Toolkit
ii  intel-basekit-env-2024.0                        2024.0.1-43                                                         all          Intel® oneAPI Base Toolkit
ii  intel-basekit-getting-started-2024.0            2024.0.1-43                                                         all          Intel® oneAPI Base Toolkit
ii  intel-fw-gpu                                    2023.39.2-255~22.04                                                 all          Firmware package for Intel integrated and discrete GPUs
ii  intel-gpu-tools                                 1.26-2                                                              amd64        tools for debugging the Intel graphics driver
ii  intel-gsc                                       0.8.9+65~u22.04                                                     amd64        Intel(R) Graphics System Controller Firmware
ii  intel-igc-cm                                    1.0.206-775~22.04                                                   amd64        Intel(R) C for Metal Compiler -- CM Frontend lib
ii  intel-igc-core                                  1.0.14828.8                                                         amd64        Intel(R) Graphics Compiler for OpenCL(TM)
ii  intel-igc-opencl                                1.0.14828.8                                                         amd64        Intel(R) Graphics Compiler for OpenCL(TM)
ii  intel-level-zero-gpu                            1.3.27191.42-775~22.04                                              amd64        Intel(R) Graphics Compute Runtime for oneAPI Level Zero.
ii  intel-media-va-driver-non-free:amd64            23.4.0-775~22.04                                                    amd64        VAAPI driver for the Intel GEN8+ Graphics family
ii  intel-metrics-discovery                         1.12.169-775~22.04                                                  amd64        Intel(R) Metrics Discovery Application Programming Interface --
ii  intel-metrics-library                           1.0.152-760~22.04                                                   amd64        Intel(R) Metrics Library for MDAPI (Intel(R) Metrics Discovery
ii  intel-microcode                                 3.20231114.0ubuntu0.22.04.1                                         amd64        Processor microcode firmware for Intel CPUs
ii  intel-oneapi-advisor                            2024.0.1-14                                                         amd64        Intel® Advisor
ii  intel-oneapi-ccl-2021.11                        2021.11.2-5                                                         amd64        Intel® oneAPI Collective Communications Library Runtime Environment
ii  intel-oneapi-ccl-devel                          2021.11.2-5                                                         amd64        Intel® oneAPI Collective Communications Library
ii  intel-oneapi-ccl-devel-2021.11                  2021.11.2-5                                                         amd64        Intel® oneAPI Collective Communications Library
ii  intel-oneapi-common-licensing                   2024.0.0-49406                                                      all          oneAPI Common License
ii  intel-oneapi-common-licensing-2024.0            2024.0.0-49406                                                      all          oneAPI Common License
ii  intel-oneapi-common-oneapi-vars                 2024.0.0-49406                                                      all          oneAPI Common Toolkit Environment Script
ii  intel-oneapi-common-oneapi-vars-2024.0          2024.0.0-49406                                                      all          oneAPI Common Toolkit Environment Script
ii  intel-oneapi-common-vars                        2024.0.0-49406                                                      all          oneAPI Common Environment Scripts
ii  intel-oneapi-compiler-cpp-eclipse-cfg-2024.0    2024.0.2-49895                                                      all          Intel® oneAPI DPC++/C++ Compiler 2024.0.2 for Linux* eclipse integration configuration file (C++)
ii  intel-oneapi-compiler-dpcpp-cpp                 2024.0.2-49895                                                      amd64        Intel® oneAPI DPC++/C++ Compiler
ii  intel-oneapi-compiler-dpcpp-cpp-2024.0          2024.0.2-49895                                                      amd64        Intel® oneAPI DPC++/C++ Compiler
ii  intel-oneapi-compiler-dpcpp-cpp-common-2024.0   2024.0.2-49895                                                      all          Intel® oneAPI DPC++/C++ Compiler 2024.0.2 for Linux*
ii  intel-oneapi-compiler-dpcpp-cpp-runtime-2024.0  2024.0.2-49895                                                      amd64        Intel® oneAPI DPC++/C++ Compiler 2024.0.2 for Linux* runtime package for Intel(R) 64
ii  intel-oneapi-compiler-dpcpp-eclipse-cfg-2024.0  2024.0.2-49895                                                      all          Intel® oneAPI DPC++/C++ Compiler 2024.0.2 for Linux* eclipse integration configuration file (DPC++)
ii  intel-oneapi-compiler-shared-2024.0             2024.0.2-49895                                                      amd64        Intel(R) Compiler Shared Files
ii  intel-oneapi-compiler-shared-common-2024.0      2024.0.2-49895                                                      all          Intel(R) Compiler Shared Files
ii  intel-oneapi-compiler-shared-runtime-2024.0     2024.0.2-49895                                                      amd64        Intel(R) Compiler Shared Files runtime contents
ii  intel-oneapi-dal-2024.0                         2024.0.1-25                                                         amd64        Intel® oneAPI Data Analytics Library
ii  intel-oneapi-dal-common-2024.0                  2024.0.1-25                                                         all          Intel® oneAPI Data Analytics Library common
ii  intel-oneapi-dal-common-devel-2024.0            2024.0.1-25                                                         all          Intel® oneAPI Data Analytics Library common
ii  intel-oneapi-dal-devel                          2024.0.1-25                                                         amd64        Intel® oneAPI Data Analytics Library Development Package
ii  intel-oneapi-dal-devel-2024.0                   2024.0.1-25                                                         amd64        Intel® oneAPI Data Analytics Library Development Package
ii  intel-oneapi-dev-utilities                      2024.0.0-49320                                                      amd64        Dev Utilities
ii  intel-oneapi-dev-utilities-2024.0               2024.0.0-49320                                                      amd64        Dev Utilities
ii  intel-oneapi-dev-utilities-eclipse-cfg-2024.0   2024.0.0-49320                                                      all          intel-oneapi-dev-utilities-eclipse-cfg
ii  intel-oneapi-diagnostics-utility                2024.0.0-49093                                                      amd64        Diagnostics Utility for Intel® oneAPI Toolkits
ii  intel-oneapi-diagnostics-utility-2024.0         2024.0.0-49093                                                      amd64        Diagnostics Utility for Intel® oneAPI Toolkits
ii  intel-oneapi-dnnl                               2024.0.0-49521                                                      amd64        Intel® oneAPI Deep Neural Network Library
ii  intel-oneapi-dnnl-2024.0                        2024.0.0-49521                                                      amd64        Intel® oneAPI Deep Neural Network Library
ii  intel-oneapi-dnnl-devel                         2024.0.0-49521                                                      amd64        Intel® oneAPI Deep Neural Network Library Development Package
ii  intel-oneapi-dnnl-devel-2024.0                  2024.0.0-49521                                                      amd64        Intel® oneAPI Deep Neural Network Library Development Package
ii  intel-oneapi-dpcpp-cpp-2024.0                   2024.0.2-49895                                                      amd64        Intel® oneAPI DPC++/C++ Compiler 2024.0.2 for Linux* for Intel(R) 64
ii  intel-oneapi-dpcpp-ct                           2024.0.0-49381                                                      amd64        Intel® DPC++ Compatibility Tool
ii  intel-oneapi-dpcpp-ct-2024.0                    2024.0.0-49381                                                      amd64        Intel® DPC++ Compatibility Tool
ii  intel-oneapi-dpcpp-ct-eclipse-cfg-2024.0        2024.0.0-49381                                                      all          Intel® DPC++ Compatibility Tool 2024.0.0 for Linux* eclipse integration configuration file
ii  intel-oneapi-dpcpp-debugger-2024.0              2024.0.1-6                                                          amd64        Intel® Distribution for GDB*
ii  intel-oneapi-icc-eclipse-plugin-cpp-2024.0      2024.0.2-49895                                                      all          Standards driven high performance cross architecture DPC++/C++ compiler
ii  intel-oneapi-ipp-2021.10                        2021.10.1-13                                                        amd64        Intel® Integrated Performance Primitives
ii  intel-oneapi-ipp-common-2021.10                 2021.10.1-13                                                        all          Intel® Integrated Performance Primitives common
ii  intel-oneapi-ipp-common-devel-2021.10           2021.10.1-13                                                        all          Intel® Integrated Performance Primitives common
ii  intel-oneapi-ipp-devel                          2021.10.1-13                                                        amd64        Intel® Integrated Performance Primitives Development Package
ii  intel-oneapi-ipp-devel-2021.10                  2021.10.1-13                                                        amd64        Intel® Integrated Performance Primitives Development Package
ii  intel-oneapi-ippcp-2021.9                       2021.9.1-5                                                          amd64        Intel® Integrated Performance Primitives Cryptography
ii  intel-oneapi-ippcp-common-2021.9                2021.9.1-5                                                          all          Intel® Integrated Performance Primitives Cryptography common
ii  intel-oneapi-ippcp-common-devel-2021.9          2021.9.1-5                                                          all          Intel® Integrated Performance Primitives Cryptography common
ii  intel-oneapi-ippcp-devel                        2021.9.1-5                                                          amd64        Intel® Integrated Performance Primitives Cryptography Development Package
ii  intel-oneapi-ippcp-devel-2021.9                 2021.9.1-5                                                          amd64        Intel® Integrated Performance Primitives Cryptography Development Package
ii  intel-oneapi-libdpstd-devel-2022.3              2022.3.0-49369                                                      amd64        Intel® oneAPI DPC++ Library 2022.3.0 for Linux*
ii  intel-oneapi-mkl-2024.0                         2024.0.0-49656                                                      amd64        Intel® oneAPI Math Kernel Library runtime package for Intel(R) 64
ii  intel-oneapi-mkl-common-2024.0                  2024.0.0-49656                                                      all          Intel® oneAPI Math Kernel Library 2024.0.0 for Linux* common
ii  intel-oneapi-mkl-common-devel-2024.0            2024.0.0-49656                                                      all          Intel® oneAPI Math Kernel Library common
ii  intel-oneapi-mkl-devel                          2024.0.0-49656                                                      amd64        Intel® oneAPI Math Kernel Library 2024.0.0 for Linux* development package for Intel(R) 64
ii  intel-oneapi-mkl-devel-2024.0                   2024.0.0-49656                                                      amd64        Intel® oneAPI Math Kernel Library 2024.0.0 for Linux* development package for Intel(R) 64
ii  intel-oneapi-mpi-2021.11                        2021.11.0-49493                                                     amd64        Intel® MPI Library Runtime Environment
ii  intel-oneapi-mpi-devel-2021.11                  2021.11.0-49493                                                     amd64        Intel® MPI Library
ii  intel-oneapi-openmp-2024.0                      2024.0.2-49895                                                      amd64        Intel® OpenMP* Runtime Library 2024.0.2 for Linux* for Intel(R) 64
ii  intel-oneapi-openmp-common-2024.0               2024.0.2-49895                                                      all          Intel® OpenMP* Runtime Library 2024.0.2 for Linux*
ii  intel-oneapi-runtime-compilers-2024             2024.0.2-49895                                                      amd64        Intel® oneAPI DPC++/C++ Compiler runtime common files
ii  intel-oneapi-runtime-compilers-common-2024      2024.0.2-49895                                                      all          Intel® oneAPI DPC++/C++ Compiler runtime common files
ii  intel-oneapi-runtime-dpcpp-cpp                  2024.0.2-49895                                                      amd64        Intel® oneAPI DPC++/C++ Compiler runtime
ii  intel-oneapi-runtime-dpcpp-cpp-2024             2024.0.2-49895                                                      amd64        Intel® oneAPI DPC++/C++ Compiler runtime
ii  intel-oneapi-runtime-dpcpp-cpp-common-2024      2024.0.2-49895                                                      all          Intel® oneAPI DPC++/C++ Compiler runtime
ii  intel-oneapi-runtime-dpcpp-sycl-core-2024       2024.0.2-49895                                                      all          Intel® oneAPI DPC++/C++ Compiler SYCL* Runtime Core
ii  intel-oneapi-runtime-dpcpp-sycl-cpu-rt-2024     2024.0.2-49895                                                      all          Intel® oneAPI DPC++/C++ Compiler SYCL* CPU
ii  intel-oneapi-runtime-dpcpp-sycl-fpga-emul-2024  2024.0.2-49895                                                      all          Intel® oneAPI DPC++/C++ Compiler SYCL* FPGA Emulator Runtime
ii  intel-oneapi-runtime-dpcpp-sycl-opencl-cpu-2024 2024.0.2-49895                                                      amd64        Intel® CPU Runtime for OpenCL(TM) Applications runtime
ii  intel-oneapi-runtime-dpcpp-sycl-rt-2024         2024.0.2-49895                                                      all          Intel® oneAPI DPC++/C++ Compiler SYCL* Runtime
ii  intel-oneapi-runtime-mkl                        2024.0.0-49656                                                      amd64        Intel® oneAPI Math Kernel Library runtime
ii  intel-oneapi-runtime-mkl-2024                   2024.0.0-49656                                                      amd64        Intel® oneAPI Math Kernel Library runtime
ii  intel-oneapi-runtime-mkl-common-2024            2024.0.0-49656                                                      all          Intel® oneAPI Math Kernel Library runtime common
ii  intel-oneapi-runtime-opencl-2024                2024.0.2-49895                                                      amd64        Intel® CPU Runtime for OpenCL(TM) Applications runtime
ii  intel-oneapi-runtime-openmp-2024                2024.0.2-49895                                                      amd64        Intel® OpenMP* Runtime Library runtime
ii  intel-oneapi-runtime-openmp-opencl-shared-2024  2024.0.2-49895                                                      amd64        Intel(R) OpenMP and OpenCL shared files for runtime package
ii  intel-oneapi-runtime-tbb-2021                   2021.11.0-49513                                                     amd64        Intel® oneAPI Threading Building Blocks runtime
ii  intel-oneapi-runtime-tbb-common-2021            2021.11.0-49513                                                     all          Intel® oneAPI Threading Building Blocks runtime common
ii  intel-oneapi-runtime-tcm-1                      1.0.0-435                                                           amd64        Thread Composability Manager
ii  intel-oneapi-tbb-2021.11                        2021.11.0-49513                                                     amd64        Intel® oneAPI Threading Building Blocks
ii  intel-oneapi-tbb-common-2021.11                 2021.11.0-49513                                                     all          Intel® oneAPI Threading Building Blocks common
ii  intel-oneapi-tbb-common-devel-2021.11           2021.11.0-49513                                                     all          Intel® oneAPI Threading Building Blocks common
ii  intel-oneapi-tbb-devel                          2021.11.0-49513                                                     amd64        Intel® oneAPI Threading Building Blocks Development Package
ii  intel-oneapi-tbb-devel-2021.11                  2021.11.0-49513                                                     amd64        Intel® oneAPI Threading Building Blocks Development Package
ii  intel-oneapi-tcm-1.0                            1.0.0-435                                                           amd64        Thread Composability Manager
ii  intel-oneapi-tlt                                2024.0.0-352                                                        amd64        Toolkit Linking Tool
ii  intel-oneapi-tlt-2024.0                         2024.0.0-352                                                        amd64        Toolkit Linking Tool
ii  intel-oneapi-vtune                              2024.0.1-11                                                         amd64        Intel® VTune(TM) Profiler
ii  intel-opencl-icd                                23.35.27191.42-775~22.04                                            amd64        Intel graphics compute runtime for OpenCL
ii  libdrm-intel1:amd64                             2.4.113-2~ubuntu0.22.04.1                                           amd64        Userspace interface to intel-specific kernel DRM services -- runtime
ii  libdrm-intel1:i386                              2.4.113-2~ubuntu0.22.04.1                                           i386         Userspace interface to intel-specific kernel DRM services -- runtime
ii  whois                                           5.5.13                                                              amd64        intelligent WHOIS client
rc  xserver-xorg-video-intel                        2:2.99.917+git20200226-1                                            amd64        X.Org X server -- Intel i8xx, i9xx display driver

and we also have

03:00.0 VGA compatible controller: Intel Corporation Device 56a1 (rev 08) (prog-if 00 [VGA controller])
        Subsystem: Intel Corporation Device 1021
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin ? routed to IRQ 108
        IOMMU group: 16
        Region 0: Memory at fa000000 (64-bit, non-prefetchable) [size=16M]
        Region 2: Memory at fa00000000 (64-bit, prefetchable) [size=8G]
        Expansion ROM at fb000000 [disabled] [size=2M]
        Capabilities: [40] Vendor Specific Information: Len=0c <?>
        Capabilities: [70] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
                        TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range B, TimeoutDis+ NROPrPrP- LTR+
                         10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkCap2: Supported Link Speeds: 2.5GT/s, Crosslink- Retimer- 2Retimers- DRS-
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
                         EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+
                Address: 00000000fee00000  Data: 0000
                Masking: 00000000  Pending: 00000000
        Capabilities: [d0] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 0
                ARICtl: MFVC- ACS-, Function Group: 0
        Capabilities: [420 v1] Physical Resizable BAR
                BAR 2: current size: 8GB, supported: 256MB 512MB 1GB 2GB 4GB 8GB
        Capabilities: [400 v1] Latency Tolerance Reporting
                Max snoop latency: 1048576ns
                Max no snoop latency: 1048576ns
        Kernel driver in use: i915
        Kernel modules: i915

yinghu5 commented 6 months ago

@djsv23 thank you for the test. Then your driver is almost same as mine (but my system is A770 have 16G memory)_, just i915 is not there, but the second command shows your i915 works ok.

From your SD run result, it seems the iteration happened, but failed later. Have you gotten chance to other machine, or try other inference code, like the one https://github.com/intel/intel-extension-for-tensorflow/blob/main/examples/quick_example.md

import numpy as np import sys

import tensorflow as tf

Conv + ReLU activation + Bias

N = 1 num_channel = 3 input_width, input_height = (5, 5) filter_width, filter_height = (2, 2)

x = np.random.rand(N, input_width, input_height, num_channel).astype(np.float32) weight = np.random.rand(filter_width, filter_height, num_channel, num_channel).astype(np.float32) bias = np.random.rand(num_channel).astype(np.float32)

conv = tf.nn.conv2d(x, weight, strides=[1, 1, 1, 1], padding='SAME') activation = tf.nn.relu(conv) result = tf.nn.bias_add(activation, bias)

print(result) print('Finished')

conda activate your environment source /opt/intel/oneapi/setup.vars python quick_example.py and if it can work, please attach all of the output? (or please attach all of the output of hello_world tensorflow last time)

Thanks

djsv23 commented 6 months ago

@yinghu5 Wild!

I ran the quick_example.py and it gave an output as expected, though it is still saying there is 0MB VRAM and the PCI bus ID is undefined:

tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:XPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: XPU, pci bus id: <undefined>)

I scaled up the parameters in the example to 9 channels and 30x input dimensions and 30x filter dimensions to create a heavier load and ran it in a loop to check and observe GPU utilization, and it seems this implementation is working.

+-----------------------------+--------------------------------------------------------------------+
| Device ID                   | 0                                                                  |
+-----------------------------+--------------------------------------------------------------------+
| GPU Utilization (%)         | N/A                                                                |
| EU Array Active (%)         | N/A                                                                |
| EU Array Stall (%)          | N/A                                                                |
| EU Array Idle (%)           | N/A                                                                |
|                             |                                                                    |
| Compute Engine Util (%)     | Engine 0: 81, Engine 1: 0, Engine 2: 0, Engine 3: 0                |
| Render Engine Util (%)      | Engine 0: 0                                                        |
| Media Engine Util (%)       | N/A                                                                |
| Decoder Engine Util (%)     | Engine 0: 0, Engine 1: 0                                           |
| Encoder Engine Util (%)     | Engine 0: 0, Engine 1: 0                                           |
| Copy Engine Util (%)        | Engine 0: 2                                                        |
| Media EM Engine Util (%)    | Engine 0: 0, Engine 1: 0                                           |
| 3D Engine Util (%)          | N/A                                                                |
+-----------------------------+--------------------------------------------------------------------+
| Reset                       | N/A                                                                |
| Programming Errors          | N/A                                                                |
| Driver Errors               | N/A                                                                |
| Cache Errors Correctable    | N/A                                                                |
| Cache Errors Uncorrectable  | N/A                                                                |
| Mem Errors Correctable      | N/A                                                                |
| Mem Errors Uncorrectable    | N/A                                                                |
+-----------------------------+--------------------------------------------------------------------+
| GPU Power (W)               | 167                                                                |
| GPU Frequency (MHz)         | 2250                                                               |
| Media Engine Freq (MHz)     | N/A                                                                |
| GPU Core Temperature (C)    | N/A                                                                |
| GPU Memory Temperature (C)  | N/A                                                                |
| GPU Memory Read (kB/s)      | N/A                                                                |
| GPU Memory Write (kB/s)     | N/A                                                                |
| GPU Memory Bandwidth (%)    | N/A                                                                |
| GPU Memory Used (MiB)       | 4795                                                               |
| GPU Memory Util (%)         | 59                                                                 |
| Xe Link Throughput (kB/s)   | N/A                                                                |
+-----------------------------+--------------------------------------------------------------------+

It seems there is still some issue with my i915 driver that is not allowing all GPU information to be accessible, and some things are able to handle it more gracefully than others

srinarayan-srikanthan commented 6 months ago

@djsv23 Good that you were able to get it working. The issue of 0Mb VRAM you are seeing is because when Tensorflow is able to detect a device but not identify the mem it is defaulting to 0. It is not an issue. And going by your observation of running a heavier load, the issue with not being able to run Stable diffusion is only the limitation from mem bottleneck of 8GB.

djsv23 commented 6 months ago

Thanks all for the help - it seems that 8GB VRAM is quite limiting in AI image generation and might require some offloading strategies. I was able to get a 128x128 image to generate, which unfortunately isn't enough to create a meaningful image from the prompt and from xpu-smi appears to have required nearly 7GB to process.

intel / intel-extension-for-tensorflow

Intel Extension for Tensorflow reports missing CUDA drivers and fails to use ARC GPU #61

Conv + ReLU activation + Bias