intel / intel-extension-for-tensorflow

Intel® Extension for TensorFlow*
Other
315 stars 39 forks source link

fp64 platform not supported exception #52

Closed sun1lach closed 10 months ago

sun1lach commented 10 months ago

SW & HW configuration:

Processor: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz (8 CPUs), ~1.8GHz
OS: Windows 11 Enterprise 64-bit
DirectX Version: DirectX 12
Integrated Graphics: Intel(R) Iris(R) Xe Graphics

Steps to Reproduce:

# requirements.txt
# pip install intel-extension-for-tensorflow[gpu]

tensorflow==2.13.0
intel-extension-for-tensorflow==2.13.0.1
intel-extension-for-tensorflow-lib==2.13.0.1.1

Run the below tensorflow code

# script.py

import tensorflow as tf
print(tf.config.list_physical_devices())
tf.keras.applications.EfficientNetV2B0(include_top=False, weights="imagenet")

Exception

$ python script.py
2023-11-16 12:29:05.897182: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-11-16 12:29:05.898345: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-11-16 12:29:05.941625: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-11-16 12:29:05.941978: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-11-16 12:29:06.643341: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2023-11-16 12:29:07.555291: I itex/core/wrapper/itex_gpu_wrapper.cc:35] Intel Extension for Tensorflow* GPU backend is loaded.
2023-11-16 12:29:07.601033: W itex/core/ops/op_init.cc:58] Op: _QuantizedMaxPool3D is already registered in Tensorflow
2023-11-16 12:29:07.611576: I itex/core/devices/gpu/itex_gpu_runtime.cc:129] Selected platform: Intel(R) Level-Zero
2023-11-16 12:29:07.611853: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:XPU:0', device_type='XPU')]
2023-11-16 12:29:07.742338: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:303] Could not identify NUMA node of platform XPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-11-16 12:29:07.742457: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:269] Created TensorFlow device (/job:localhost/replica:0/task:0/device:XPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: XPU, pci bus id: <undefined>)
2023-11-16 12:29:11.352395: W itex/core/utils/op_kernel.cc:360] Aborted: Cast op uses fp64 data type, while fp64 instructions are not supported on the platform.
Traceback (most recent call last):
  File "script.py", line 5, in <module>
    tf.keras.applications.EfficientNetV2B0(include_top=False, weights="imagenet")
  File "/home/user/packages/lib/python3.11/site-packages/keras/src/applications/efficientnet_v2.py", line 1129, in EfficientNetV2B0
    return EfficientNetV2(
           ^^^^^^^^^^^^^^^
  File "/home/user/packages/lib/python3.11/site-packages/keras/src/applications/efficientnet_v2.py", line 967, in EfficientNetV2
    x = layers.Normalization(
        ^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/packages/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/user/packages/lib/python3.11/site-packages/tensorflow/python/framework/ops.py", line 6656, in raise_from_not_ok_status
    raise core._status_to_exception(e) from None  # pylint: disable=protected-access
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
tensorflow.python.framework.errors_impl.AbortedError: {{function_node __wrapped__Cast_device_/job:localhost/replica:0/task:0/device:XPU:0}} Cast op uses fp64 data type, while fp64 instructions are not supported on the platform. [Op:Cast] name: 

I was expecting the conversion to supported floating-point (fp16/fp32) precision implicitly but instead it raises this error. Works fine if ITEX is targeted to CPU but fails with XPU.

feng-intel commented 10 months ago

Thank you trying this. We officially do not support igpu. I can support you personally.

Could you give more installation details? I can't install like this in my notebook:
$ pip install intel-extension-for-tensorflow[gpu]

sun1lach commented 10 months ago

Sure. Any specific error or exception you get while trying to install through pip --timeout=1000 install intel-extension-for-tensorflow[gpu]? I could install itex with the above command. In case if the above doesnt work, please also try intel-extension-for-tensorflow[xpu]

feng-intel commented 10 months ago

Could you show your $ pip list to check intel-extension-for-tensorflow version ?

sun1lach commented 10 months ago

Here's the version information from pip list

intel-extension-for-tensorflow     2.13.0.1
intel-extension-for-tensorflow-lib 2.13.0.1.1
tensorflow                         2.13.0
sun1lach commented 10 months ago

OneAPI base toolkit packages needs to be installed to target iGPU as an XPU device. https://intel.github.io/intel-extension-for-tensorflow/latest/docs/install/install_for_xpu.html#install-oneapi-base-toolkit-packages

feng-intel commented 10 months ago

I can install "intel-extension-for-tensorflow 0.0.0.dev1", but not 2.13.0.1. Anyway, the error is "Cast op uses fp64 data type, while fp64 instructions are not supported on the platform. [Op:Cast] name: " Could you set these and try:

$ export OverrideDefaultFP64Setting=1 $ export IGC_EnableDPEmulation=1

sun1lach commented 10 months ago

Thanks @feng-intel Above 2 flags did the trick. Although custom platform optimisation is slow and takes a while on iGPU/XPU platform, it works.

NeoZhangJianyu commented 10 months ago

ITEX release is not compiled with AOT for iGPU. That will make the startup take more time. The running speed won't be impacted.

If you want to speed up the startup, you could build ITEX with AOT for iGPU from source code. Please refer to the source build guide of ITEX.

sun1lach commented 10 months ago

Thank you @NeoZhangJianyu for the suggestion. I will try this approach.