intel / intel-extension-for-pytorch

A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
Apache License 2.0
1.47k stars 228 forks source link

Fail to run LLM inference with ipex 2.1.30 and oneapi 2024.1 on Windows iGPU #645

Open plusbang opened 1 month ago

plusbang commented 1 month ago

Describe the bug

On Windows iGPU, I tried to run LLM inference with ipex=2.1.30+xpu and oneapi=2024.1, but failed. Wait for more than 1 hour but still pending at here image

To reproduce:

set SYCL_CACHE_PERSISTENT=1
call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"

And run the following code

import torch
import intel_extension_for_pytorch as ipex
import time

from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = r"D:\llm-models\Qwen-1_8B-Chat"

model = AutoModelForCausalLM.from_pretrained(model_path,
                                             torch_dtype=torch.float16,
                                             trust_remote_code=True,
                                             use_cache=True)
model = model.to('xpu')
tokenizer = AutoTokenizer.from_pretrained(model_path,
                                          trust_remote_code=True)

# Generate predicted tokens
with torch.inference_mode():
    prompt = "Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun"
    input_ids = tokenizer.encode(prompt, return_tensors="pt").to('xpu')
    print('Start generation')
    st = time.time()
    output = model.generate(input_ids,
                            max_new_tokens=32)
    end = time.time()
    torch.xpu.synchronize()
    output_str = tokenizer.decode(output[0], skip_special_tokens=True)
    print(f'Inference time: {end-st} s')
    print('-'*20, 'Output', '-'*20)
    print(output_str)

Versions

Collecting environment information... PyTorch version: N/A PyTorch CXX11 ABI: N/A IPEX version: N/A IPEX commit: N/A Build type: N/A

OS: Microsoft Windows 11 家庭中文版 GCC version: (GCC) 13.2.0 Clang version: N/A IGC version: N/A CMake version: N/A Libc version: N/A

Python version: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:27:10) [MSC v.1938 64 bit (AMD64)] (64-bit runtime) Python platform: Windows-10-10.0.22631-SP0 Is XPU available: N/A DPCPP runtime version: N/A MKL version: N/A GPU models and configuration: N/A Intel OpenCL ICD version: N/A Level Zero version: N/A

CPU: Architecture=9 CurrentClockSpeed=1200 DeviceID=CPU0 Family=1 L2CacheSize=14336 L2CacheSpeed= Manufacturer=GenuineIntel MaxClockSpeed=3600 Name=Intel(R) Core(TM) Ultra 5 125H ProcessorType=3 Revision=

Versions of relevant libraries: [pip3] intel-extension-for-pytorch==2.1.30+xpu [pip3] numpy==1.26.4 [pip3] torch==2.1.0.post2+cxx11.abi [pip3] torchaudio==2.1.0.post2+cxx11.abi [pip3] torchvision==0.16.0.post2+cxx11.abi [conda] intel-extension-for-pytorch 2.1.30+xpu pypi_0 pypi [conda] numpy 1.26.4 pypi_0 pypi [conda] torch 2.1.0.post2+cxx11.abi pypi_0 pypi [conda] torchaudio 2.1.0.post2+cxx11.abi pypi_0 pypi [conda] torchvision 0.16.0.post2+cxx11.abi pypi_0 pypi

Vasud-ha commented 1 month ago

Hi @plusbang , thanks for reporting it we will try to reproduce it at our end and return to you.

Vasud-ha commented 1 month ago

Hi @plusbang, I am trying to get a machine to reproduce the issue.

Vasud-ha commented 1 month ago

Hi @plusbang, this issue is reproducible as current support for IPEX iGPU is limited.