Fail to run LLM inference with ipex 2.1.30 and oneapi 2024.1 on Windows iGPU

plusbang commented 1 month ago

Describe the bug

On Windows iGPU, I tried to run LLM inference with ipex=2.1.30+xpu and oneapi=2024.1, but failed. Wait for more than 1 hour but still pending at here

To reproduce:

set SYCL_CACHE_PERSISTENT=1
call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"

And run the following code

import torch
import intel_extension_for_pytorch as ipex
import time

from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = r"D:\llm-models\Qwen-1_8B-Chat"

model = AutoModelForCausalLM.from_pretrained(model_path,
                                             torch_dtype=torch.float16,
                                             trust_remote_code=True,
                                             use_cache=True)
model = model.to('xpu')
tokenizer = AutoTokenizer.from_pretrained(model_path,
                                          trust_remote_code=True)

# Generate predicted tokens
with torch.inference_mode():
    prompt = "Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun"
    input_ids = tokenizer.encode(prompt, return_tensors="pt").to('xpu')
    print('Start generation')
    st = time.time()
    output = model.generate(input_ids,
                            max_new_tokens=32)
    end = time.time()
    torch.xpu.synchronize()
    output_str = tokenizer.decode(output[0], skip_special_tokens=True)
    print(f'Inference time: {end-st} s')
    print('-'*20, 'Output', '-'*20)
    print(output_str)

Versions

Collecting environment information... PyTorch version: N/A PyTorch CXX11 ABI: N/A IPEX version: N/A IPEX commit: N/A Build type: N/A

OS: Microsoft Windows 11 家庭中文版 GCC version: (GCC) 13.2.0 Clang version: N/A IGC version: N/A CMake version: N/A Libc version: N/A

Python version: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:27:10) [MSC v.1938 64 bit (AMD64)] (64-bit runtime) Python platform: Windows-10-10.0.22631-SP0 Is XPU available: N/A DPCPP runtime version: N/A MKL version: N/A GPU models and configuration: N/A Intel OpenCL ICD version: N/A Level Zero version: N/A

CPU: Architecture=9 CurrentClockSpeed=1200 DeviceID=CPU0 Family=1 L2CacheSize=14336 L2CacheSpeed= Manufacturer=GenuineIntel MaxClockSpeed=3600 Name=Intel(R) Core(TM) Ultra 5 125H ProcessorType=3 Revision=

Versions of relevant libraries: [pip3] intel-extension-for-pytorch==2.1.30+xpu [pip3] numpy==1.26.4 [pip3] torch==2.1.0.post2+cxx11.abi [pip3] torchaudio==2.1.0.post2+cxx11.abi [pip3] torchvision==0.16.0.post2+cxx11.abi [conda] intel-extension-for-pytorch 2.1.30+xpu pypi_0 pypi [conda] numpy 1.26.4 pypi_0 pypi [conda] torch 2.1.0.post2+cxx11.abi pypi_0 pypi [conda] torchaudio 2.1.0.post2+cxx11.abi pypi_0 pypi [conda] torchvision 0.16.0.post2+cxx11.abi pypi_0 pypi

Vasud-ha commented 1 month ago

Hi @plusbang , thanks for reporting it we will try to reproduce it at our end and return to you.

Vasud-ha commented 1 month ago

Hi @plusbang, I am trying to get a machine to reproduce the issue.

Vasud-ha commented 1 month ago

Hi @plusbang, this issue is reproducible as current support for IPEX iGPU is limited.

intel / intel-extension-for-pytorch

Fail to run LLM inference with ipex 2.1.30 and oneapi 2024.1 on Windows iGPU #645

Describe the bug

Versions