Closed nec4 closed 9 months ago
Hi @nec4 , have you followed the GPU driver installation instruction for Max 1550? Reference: https://github.com/intel/intel-extension-for-tensorflow/blob/main/docs/install/install_for_xpu.md#install-gpu-drivers
Also can you check the environment and verify installation by following this and post the screenshot: https://github.com/intel/intel-extension-for-tensorflow/blob/main/docs/install/install_for_xpu.md#check-the-environment-for-xpu and https://github.com/intel/intel-extension-for-tensorflow/blob/main/docs/install/install_for_xpu.md#verify-the-installation
Thank you for your advice - I will check.
@nec4 Please check the links and post the output so that we can look into it.
From a clean enviromnemt and following the steps outlined in https://github.com/intel/intel-extension-for-tensorflow/blob/main/docs/install/install_for_xpu.md#check-the-environment-for-xpu :
Check Environment for Intel(R) Extension for TensorFlow*...
======================== Check Python ========================
python3.9 is installed.
==================== Check Python Passed =====================
========================== Check OS ==========================
Unknow OS rocky.
====================== Check OS Failed =======================
Import check:
2024-01-12 11:30:45.826938: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-01-12 11:30:45.852516: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-01-12 11:30:45.852541: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-01-12 11:30:45.852561: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-01-12 11:30:45.857679: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-01-12 11:30:46.683997: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-01-12 11:30:50.789531: I itex/core/wrapper/itex_gpu_wrapper.cc:35] Intel Extension for Tensorflow* GPU backend is loaded.
2024-01-12 11:30:50.830048: I itex/core/wrapper/itex_cpu_wrapper.cc:60] Intel Extension for Tensorflow* AVX512 CPU backend is loaded.
2024-01-12 11:30:51.124575: I itex/core/devices/gpu/itex_gpu_runtime.cc:129] Selected platform: Intel(R) Level-Zero
2.14.0.1
It seems we had an older driver installed (I currently do not have root access to the system so I cannot change this myself). It seems that the latest release is 775 https://dgpu-docs.intel.com/releases/stable_775_20_20231219.html . Is this version compatible with the intel extension for tensorflow (given that is different from what is suggested -736- in the docs)?
EDIT: seems now the quick example is running. I will run some more tests.
Yes please install the latest version and it is compatible with intel extension for tensorflow. If you are facing any other issues open a new issue.
Hello. I am trying to run a slightly modified (basically a jupyter noteboook --> python script) version of the BERT example from here: https://github.com/intel/intel-extension-for-tensorflow/tree/main/examples/train_bert
However, I encounter a runtime error when trying to run the code on Intel GPU:
The training works on CPU, but fails for XPU. Furthermore, the unmodified "quick example" (https://github.com/intel/intel-extension-for-tensorflow/blob/main/examples/quick_example.md) also fails for XPU with a segmentation fault:
The environment was created using the intel python distribution as such:
And here is information about the OS/accelerators:
How can these runtime errors be resolved? I am happy to provide more information/details as necessary. Thanks!