intel / intel-extension-for-pytorch

A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
Apache License 2.0
1.53k stars 236 forks source link

PI_ERROR_INVALID_BINARY when trying to run whisper #415

Closed nedo99 closed 1 year ago

nedo99 commented 1 year ago

Describe the bug

Hi,

I am trying to run the whisper inference on the GPU Arc 770, but I am getting following error:

While copying the parameter named "decoder.blocks.23.mlp.2.weight", whose dimensions in the model are torch.Size([1024, 4096]) and whose dimensions in the checkpoint are torch.Size([1024, 4096]), an exception occurred : ("The program was built for 1 devices\nBuild program log for 'Intel(R) Graphics [0x5690]':\n -42 (PI_ERROR_INVALID_BINARY) -42 (PI_ERROR_INVALID_BINARY)",)

The code example that I am using:

import whisper
import intel_extension_for_pytorch as ipex
model = whisper.load_model('medium', device="xpu")
text = model.transcribe(audio_file_name)
print(text['text'])

The issue can be reproduced by using model transformers int4 (this example). I get slightly different error output in that case:

is_pad_token_in_inputs = (pad_token_id is not None) and (pad_token_id in inputs)
  File "/home/user/miniconda3/envs/llm/lib/python3.9/site-packages/torch/_tensor.py", line 997, in __contains__
    return (element == self).any().item()  # type: ignore[union-attr]
RuntimeError: The program was built for 1 devices
Build program log for 'Intel(R) Graphics [0x5690]':
 -42 (PI_ERROR_INVALID_BINARY)

Versions

Name: torch Version: 2.0.1a0+cxx11.abi

Name: torchvision Version: 0.15.2a0+cxx11.abi

Name: intel-extension-for-pytorch Version: 2.0.110+xpu

Python 3.9.17

oneAPI version: 2023.2.0

sycl-ls output:

[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2023.16.6.0.22_223734]
[opencl:cpu:1] Intel(R) OpenCL, 12th Gen Intel(R) Core(TM) i7-12700H 3.0 [2023.16.6.0.22_223734]
[opencl:gpu:2] Intel(R) OpenCL HD Graphics, Intel(R) Graphics [0x5690] 3.0 [22.49.25018.23]
[opencl:gpu:3] Intel(R) OpenCL HD Graphics, Intel(R) Graphics [0x46a6] 3.0 [22.49.25018.23]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Graphics [0x5690] 1.3 [1.3.25018]
[ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) Graphics [0x46a6] 1.3 [1.3.25018]
nedo99 commented 1 year ago

It was the driver version issue. Once I updated to the versions below, I do not get the error.

[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2023.16.6.0.22_223734]
[opencl:cpu:1] Intel(R) OpenCL, 12th Gen Intel(R) Core(TM) i7-12700H 3.0 [2023.16.6.0.22_223734]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770M Graphics 3.0 [23.17.26241.33]
[opencl:gpu:3] Intel(R) OpenCL Graphics, Intel(R) Graphics [0x46a6] 3.0 [23.17.26241.33]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770M Graphics 1.3 [1.3.26241]
[ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) Graphics [0x46a6] 1.3 [1.3.26241]