Using OpenVINO 2024.2.0 as backend to Pytorch (intel_extension_for_pytorch 2.1.30.post0)
Python 3.10 within conda forge enviornment on Ubuntu 24.04
oneAPI 2024.1
12th Gen NUC with 12th Gen Intel(R) Core(TM) i7-12700H CPU and A770m GPU
Hello query device output from the env:
(py310) user@NUC12SNKi72:~$ python3 /usr/share/openvino/samples/python/hello_query_device/hello_query_device.py
[ INFO ] Available devices:
[ INFO ] CPU :
[ INFO ] SUPPORTED_PROPERTIES:
[ INFO ] AVAILABLE_DEVICES:
[ INFO ] RANGE_FOR_ASYNC_INFER_REQUESTS: 1, 1, 1
[ INFO ] RANGE_FOR_STREAMS: 1, 20
[ INFO ] EXECUTION_DEVICES: CPU
[ INFO ] FULL_DEVICE_NAME: 12th Gen Intel(R) Core(TM) i7-12700H
[ INFO ] OPTIMIZATION_CAPABILITIES: FP32, INT8, BIN, EXPORT_IMPORT
[ INFO ] DEVICE_TYPE: Type.INTEGRATED
[ INFO ] DEVICE_ARCHITECTURE: intel64
[ INFO ] NUM_STREAMS: 1
[ INFO ] INFERENCE_NUM_THREADS: 0
[ INFO ] PERF_COUNT: False
[ INFO ] INFERENCE_PRECISION_HINT: <Type: 'float32'>
[ INFO ] PERFORMANCE_HINT: PerformanceMode.LATENCY
[ INFO ] EXECUTION_MODE_HINT: ExecutionMode.PERFORMANCE
[ INFO ] PERFORMANCE_HINT_NUM_REQUESTS: 0
[ INFO ] ENABLE_CPU_PINNING: True
[ INFO ] SCHEDULING_CORE_TYPE: SchedulingCoreType.ANY_CORE
[ INFO ] MODEL_DISTRIBUTION_POLICY: set()
[ INFO ] ENABLE_HYPER_THREADING: True
[ INFO ] DEVICE_ID:
[ INFO ] CPU_DENORMALS_OPTIMIZATION: False
[ INFO ] LOG_LEVEL: Level.NO
[ INFO ] CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE: 1.0
[ INFO ] DYNAMIC_QUANTIZATION_GROUP_SIZE: 0
[ INFO ] KV_CACHE_PRECISION: <Type: 'float16'>
[ INFO ] AFFINITY: Affinity.HYBRID_AWARE
[ INFO ]
[ INFO ] GPU.0 :
[ INFO ] SUPPORTED_PROPERTIES:
[ INFO ] AVAILABLE_DEVICES: 0, 1
[ INFO ] RANGE_FOR_ASYNC_INFER_REQUESTS: 1, 2, 1
[ INFO ] RANGE_FOR_STREAMS: 1, 2
[ INFO ] OPTIMAL_BATCH_SIZE: 1
[ INFO ] MAX_BATCH_SIZE: 1
[ INFO ] DEVICE_ARCHITECTURE: GPU: vendor=0x8086 arch=v12.3.0
[ INFO ] FULL_DEVICE_NAME: Intel(R) Iris(R) Xe Graphics (iGPU)
[ INFO ] DEVICE_UUID: 8680a6460c0000000002000000000000
[ INFO ] DEVICE_LUID: 0200000000000000
[ INFO ] DEVICE_TYPE: Type.INTEGRATED
[ INFO ] DEVICE_GOPS: {<Type: 'float16'>: 4300.7998046875, <Type: 'float32'>: 2150.39990234375, <Type: 'int8_t'>: 8601.599609375, <Type: 'uint8_t'>: 8601.599609375}
[ INFO ] OPTIMIZATION_CAPABILITIES: FP32, BIN, FP16, INT8, EXPORT_IMPORT
[ INFO ] GPU_DEVICE_TOTAL_MEM_SIZE: 14863626240
[ INFO ] GPU_UARCH_VERSION: 12.3.0
[ INFO ] GPU_EXECUTION_UNITS_COUNT: 96
[ INFO ] GPU_MEMORY_STATISTICS: {}
[ INFO ] PERF_COUNT: False
[ INFO ] MODEL_PRIORITY: Priority.MEDIUM
[ INFO ] GPU_HOST_TASK_PRIORITY: Priority.MEDIUM
[ INFO ] GPU_QUEUE_PRIORITY: Priority.MEDIUM
[ INFO ] GPU_QUEUE_THROTTLE: Priority.MEDIUM
[ INFO ] GPU_ENABLE_SDPA_OPTIMIZATION: True
[ INFO ] GPU_ENABLE_LOOP_UNROLLING: True
[ INFO ] GPU_DISABLE_WINOGRAD_CONVOLUTION: False
[ INFO ] CACHE_DIR:
[ INFO ] CACHE_MODE: CacheMode.OPTIMIZE_SPEED
[ INFO ] PERFORMANCE_HINT: PerformanceMode.LATENCY
[ INFO ] EXECUTION_MODE_HINT: ExecutionMode.PERFORMANCE
[ INFO ] COMPILATION_NUM_THREADS: 20
[ INFO ] NUM_STREAMS: 1
[ INFO ] PERFORMANCE_HINT_NUM_REQUESTS: 0
[ INFO ] INFERENCE_PRECISION_HINT: <Type: 'float16'>
[ INFO ] ENABLE_CPU_PINNING: False
[ INFO ] DEVICE_ID: 0
[ INFO ]
[ INFO ] GPU.1 :
[ INFO ] SUPPORTED_PROPERTIES:
[ INFO ] AVAILABLE_DEVICES: 0, 1
[ INFO ] RANGE_FOR_ASYNC_INFER_REQUESTS: 1, 2, 1
[ INFO ] RANGE_FOR_STREAMS: 1, 2
[ INFO ] OPTIMAL_BATCH_SIZE: 1
[ INFO ] MAX_BATCH_SIZE: 1
[ INFO ] DEVICE_ARCHITECTURE: GPU: vendor=0x8086 arch=v12.55.8
[ INFO ] FULL_DEVICE_NAME: Intel(R) Arc(TM) A770M Graphics (dGPU)
[ INFO ] DEVICE_UUID: 86809056080000000300000000000000
[ INFO ] DEVICE_LUID: 0200000000000000
[ INFO ] DEVICE_TYPE: Type.DISCRETE
[ INFO ] DEVICE_GOPS: {<Type: 'float16'>: 0.0, <Type: 'float32'>: 16793.599609375, <Type: 'int8_t'>: 0.0, <Type: 'uint8_t'>: 0.0}
[ INFO ] OPTIMIZATION_CAPABILITIES: FP32, BIN, FP16, INT8, GPU_HW_MATMUL, EXPORT_IMPORT
[ INFO ] GPU_DEVICE_TOTAL_MEM_SIZE: 16225243136
[ INFO ] GPU_UARCH_VERSION: 12.55.8
[ INFO ] GPU_EXECUTION_UNITS_COUNT: 512
[ INFO ] GPU_MEMORY_STATISTICS: {}
[ INFO ] PERF_COUNT: False
[ INFO ] MODEL_PRIORITY: Priority.MEDIUM
[ INFO ] GPU_HOST_TASK_PRIORITY: Priority.MEDIUM
[ INFO ] GPU_QUEUE_PRIORITY: Priority.MEDIUM
[ INFO ] GPU_QUEUE_THROTTLE: Priority.MEDIUM
[ INFO ] GPU_ENABLE_SDPA_OPTIMIZATION: True
[ INFO ] GPU_ENABLE_LOOP_UNROLLING: True
[ INFO ] GPU_DISABLE_WINOGRAD_CONVOLUTION: False
[ INFO ] CACHE_DIR:
[ INFO ] CACHE_MODE: CacheMode.OPTIMIZE_SPEED
[ INFO ] PERFORMANCE_HINT: PerformanceMode.LATENCY
[ INFO ] EXECUTION_MODE_HINT: ExecutionMode.PERFORMANCE
[ INFO ] COMPILATION_NUM_THREADS: 20
[ INFO ] NUM_STREAMS: 1
[ INFO ] PERFORMANCE_HINT_NUM_REQUESTS: 0
[ INFO ] INFERENCE_PRECISION_HINT: <Type: 'float16'>
[ INFO ] ENABLE_CPU_PINNING: False
[ INFO ] DEVICE_ID: 1
[ INFO ]
(py310) user@NUC12SNKi72:~$
Step-by-step reproduction
In code below, if you change torch.compile() line from GPU.1 to GPU.0, real numbers are printed in prediction. If GPU.1 is used, NANs are printed.
(py310) user@NUC12SNKi72:~$ cat test.py
import torch
import intel_extension_for_pytorch as ipex
import torchvision.models as models
import openvino.torch
model = models.resnet50(weights="ResNet50_Weights.DEFAULT")
model.eval()
data = torch.rand(1, 3, 224, 224)
model = torch.compile(model, backend="openvino", options = {"device" : "GPU.1", "model_caching" : True, "cache_dir": "./model_cache"})
#model = torch.compile(model, backend="openvino", options = {"device" : "CPU"})\n')
model = model.to("xpu")
data = data.to("xpu")
data = torch.rand((1,3,224,224))
print("Input data shape: ", data.shape)
dtype=torch.bfloat16
data=data.to('xpu')
pred=model(data)
print("Prediction: ", pred)
(py310) user@NUC12SNKi72:~$
Relevant log output
No response
Issue submission checklist
[X] I'm reporting an issue. It's not a question.
[X] I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
[X] There is reproducer code and related data files such as images, videos, models, etc.
OpenVINO Version
2024.2.0
Operating System
Other (Please specify in description)
Device used for inference
GPU
Framework
PyTorch
Model used
No response
Issue description
Using OpenVINO 2024.2.0 as backend to Pytorch (intel_extension_for_pytorch 2.1.30.post0) Python 3.10 within conda forge enviornment on Ubuntu 24.04 oneAPI 2024.1 12th Gen NUC with 12th Gen Intel(R) Core(TM) i7-12700H CPU and A770m GPU
Hello query device output from the env:
Step-by-step reproduction
In code below, if you change torch.compile() line from GPU.1 to GPU.0, real numbers are printed in prediction. If GPU.1 is used, NANs are printed.
Relevant log output
No response
Issue submission checklist