intel / intel-extension-for-pytorch

A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
Apache License 2.0
1.56k stars 237 forks source link

Is there any plan to support intel iGPU? #406

Open rnwang04 opened 1 year ago

rnwang04 commented 1 year ago

Describe the issue

I have been using ipex-xpu for a while with Arc series dGPU. It's great. Recently, I tred to run the same code on my iGPU(Intel(R) UHD Graphics 770 1.3 [1.3.26241] / Intel(R) Graphics [0x46a6] 1.3 [1.3.26241]), and got the same error message when using F.linear:

onednn_verbose,info,oneDNN v3.2.0 (commit 67bc621a2da4aefc51f0a59b2af2398fa1d3e1c8)
onednn_verbose,info,cpu,runtime:threadpool,nthr:10
onednn_verbose,info,cpu,isa:Intel AVX2 with Intel DL Boost
onednn_verbose,info,gpu,runtime:DPC++
onednn_verbose,info,gpu,engine,0,backend:Level Zero,name:Intel(R) Arc(TM) A730M Graphics,driver_version:1.3.26241,binary_kernels:enabled
onednn_verbose,info,gpu,engine,1,backend:Level Zero,name:Intel(R) Graphics [0x46a6],driver_version:1.3.26241,binary_kernels:enabled
onednn_verbose,info,experimental features are enabled
onednn_verbose,info,use batch_normalization stats one pass is enabled
onednn_verbose,info,prim_template:operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,problem_desc,exec_time
onednn_verbose,error,level_zero,errcode 1879048196
Traceback (most recent call last):
  File "/home/arda/ruonan/gpu-test/test_chatglm2.py", line 46, in <module>
    output = model.generate(**inputs, do_sample=False, temperature=0.9, max_new_tokens=32)
  File "/home/arda/miniconda3/envs/ruonan-ipex2/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/arda/miniconda3/envs/ruonan-ipex2/lib/python3.9/site-packages/transformers/generation/utils.py", line 1538, in generate
    return self.greedy_search(
  File "/home/arda/miniconda3/envs/ruonan-ipex2/lib/python3.9/site-packages/transformers/generation/utils.py", line 2362, in greedy_search
    outputs = self(
  File "/home/arda/miniconda3/envs/ruonan-ipex2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/arda/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 845, in forward
    transformer_outputs = self.transformer(
  File "/home/arda/miniconda3/envs/ruonan-ipex2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/arda/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 741, in forward
    hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder(
  File "/home/arda/miniconda3/envs/ruonan-ipex2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/arda/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 588, in forward
    hidden_states, kv_cache = layer(
  File "/home/arda/miniconda3/envs/ruonan-ipex2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/arda/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 510, in forward
    attention_output, kv_cache = self.self_attention(
  File "/home/arda/miniconda3/envs/ruonan-ipex2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/arda/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 342, in forward
    mixed_x_layer = self.query_key_value(hidden_states)
  File "/home/arda/miniconda3/envs/ruonan-ipex2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/arda/miniconda3/envs/ruonan-ipex2/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: could not create a primitive

I just wonder is there any plan to support Iris iGPU?

jingxu10 commented 1 year ago

No, not in plan yet. This error should not be related to iGPU. Could you share a simple reproducer?

rnwang04 commented 1 year ago

No, not in plan yet. This error should not be related to iGPU. Could you share a simple reproducer?

Sure, actually a simple linear layer can reproduce this error. Suppose test machine has a dGPU and an iGPU, and iGPU is xpu:1, then below code will raise the same error.

import torch
import intel_extension_for_pytorch as ipex
from torch import nn

lin = nn.Linear(1000, 1000)
lin.to('xpu:1')
data = torch.ones(1000).to('xpu:1')
lin(data)
onednn_verbose,info,oneDNN v3.2.0 (commit 67bc621a2da4aefc51f0a59b2af2398fa1d3e1c8)
onednn_verbose,info,cpu,runtime:threadpool,nthr:12
onednn_verbose,info,cpu,isa:Intel AVX2 with Intel DL Boost
onednn_verbose,info,gpu,runtime:DPC++
onednn_verbose,info,gpu,engine,0,backend:Level Zero,name:Intel(R) Arc(TM) A770 Graphics,driver_version:1.3.26241,binary_kernels:enabled
onednn_verbose,info,gpu,engine,1,backend:Level Zero,name:Intel(R) UHD Graphics 770,driver_version:1.3.26241,binary_kernels:enabled
onednn_verbose,info,experimental features are enabled
onednn_verbose,info,use batch_normalization stats one pass is enabled
onednn_verbose,info,prim_template:operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,problem_desc,exec_time
onednn_verbose,error,level_zero,errcode 1879048196
Traceback (most recent call last):
  File "~/test_gpu_free.py", line 8, in <module>
    lin(data)
  File "/opt/anaconda3/envs/ipex2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/anaconda3/envs/ipex2/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: could not create a primitive
fcharras commented 1 year ago

Just to share personal experience, I've been using ipex on iGPU pretty reliably, at least for running numpy-like array operations on an igpu. Seems to work reasonably reliably, after compiling from source with the proper AOT flag for my device, so that it does not suffer from the JIT overhead. Did you compile from source or are you using the latest official binaries ?

On my local laptop that only embeds an iGPU i can confirm that this snippet does work:

import torch
import intel_extension_for_pytorch as ipex
from torch import nn

lin = nn.Linear(1000, 1000)
lin.to('xpu')
data = torch.ones(1000).to('xpu')
lin(data)

so the issue you're reporting here might have more to do with device string management in ipex rather than with the second xpu being an igpu.

rnwang04 commented 1 year ago

@fcharras Hi, thanks for sharing! Above results are obtained with latest official binaries. Would you mind sharing how to compile from source with the proper AOT flag? Thanks again! I may have a try again when I find suitable device. By the way, is your laptop using linux or windows?

fcharras commented 1 year ago

I'm running linux. For compiling from source I follow the guide that consists in running the compile bundle provided by ipex, it builds wheels for torch+torchvision+torchaudio+ipex. I don't think it can run on windows.

It is expected to be executed in a fresh conda environment (maybe ensure python<3.11 is installed, not sure about compatibility with >=3.11), also needs some basic commands like git, patch,...

The compile bundle takes an optional third AOT parameter, there you can pass all architectures you want it to be compatible with. For igpus the strings are given there. You can input several targets e.g "pvc,tgllp" for max series + gen 11 laptops support.

The first two parameters are paths to some folders in the file tree of the oneapi basekit that you can get from intel website.

I think there's some issue where you're required to export LD_LIBRARY_PATH="" before starting the script.

It's a long process (1h+) and needs a lot of ram, I couldn't run it on my laptop, had to build the wheels on a workstation.

leonardozcm commented 11 months ago

It's a long process (1h+) and needs a lot of ram, I couldn't run it on my laptop, had to build the wheels on a workstation.

I have repeated the build process of @fcharras said, with AOT=adls, and tested it on a double gpu machine(UHD Graphic 770+ARC750), and still encountered RuntimeError: could not create a primitive. I doubt what @fcharras said is the fact that

so the issue you're reporting here might have more to do with device string management in ipex rather than with the second xpu being an igpu.

I may update the result if I can find a iGPU only machine.