intel / intel-extension-for-pytorch

A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
Apache License 2.0
1.58k stars 242 forks source link

Illegal instruction (core dumped) in intel N305 #450

Open joebnb opened 11 months ago

joebnb commented 11 months ago

Describe the bug

hello,im trying useing ipex on n305 ,but throw a error when using,my environment is complexing,i would share my case to analysis to make ipex strong and hope give some advice or solution on my case.

Environment

i ran pytorch in a docker container which instanced from pthon:3.9.18 under ubuntu 20.04 and this ubuntu is installed in a PVE VM.

PVE VM> UBUNTU 20.04> Docker From Python[Python:3.9,Pytorch 2.10,ipex 2.1.0]

and i was configured GPU direct to VM

# command on ubuntu
lspci | grep VGA
00:02.0 VGA compatible controller: Device 1234:1111 (rev 02)
00:10.0 VGA compatible controller: Intel Corporation Device 46d0

Problem

when i install ipex as official get stared

# on container
pip install intel_extension_for_pytorch

and modify the function where do modle load

#!/usr/bin/env python
# coding=utf-8
## From: https://github.com/THUDM/ChatGLM-6B
import torch
import os
##### import ipex
import intel_extension_for_pytorch as ipex
##### import ipex
from typing import Dict, Union, Optional

from torch.nn import Module
from transformers import AutoModel, AutoTokenizer

from .chat import do_chat, do_chat_stream

def init_chatglm(model_path: str, running_device: str, gpus: int):
    tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

    if running_device.upper() == "GPU":
        model = load_model_on_gpus(model_path, gpus)
    else:
        model = AutoModel.from_pretrained(model_path, trust_remote_code=True)
        model = model.float()

    model.eval()
##### follow as manual
    model = ipex.optimize(model)
##### follow as manual
    model.do_chat = do_chat
    model.do_chat_stream = do_chat_stream
    return tokenizer, model

def auto_configure_device_map(num_gpus: int) -> Dict[str, int]:
    num_trans_layers = 28
    per_gpu_layers = 30 / num_gpus

    device_map = {'transformer.word_embeddings': 0,
                  'transformer.final_layernorm': 0, 'lm_head': 0}

    used = 2
    gpu_target = 0
    for i in range(num_trans_layers):
        if used >= per_gpu_layers:
            gpu_target += 1
            used = 0
        assert gpu_target < num_gpus
        device_map[f'transformer.layers.{i}'] = gpu_target
        used += 1

    return device_map

def load_model_on_gpus(checkpoint_path: Union[str, os.PathLike], num_gpus: int = 2,
                       device_map: Optional[Dict[str, int]] = None, **kwargs) -> Module:
    if num_gpus < 2 and device_map is None:
        model = AutoModel.from_pretrained(
            checkpoint_path, trust_remote_code=True, **kwargs).half().cuda()
    else:
        if num_gpus > torch.cuda.device_count():
            raise Exception(f"need {num_gpus} GPU, but only has {torch.cuda.device_count()}")

        from accelerate import dispatch_model

        model = AutoModel.from_pretrained(
            checkpoint_path, trust_remote_code=True, **kwargs).half()

        if device_map is None:
            device_map = auto_configure_device_map(num_gpus)

        model = dispatch_model(model, device_map=device_map)
        print(f"Device Map: {model.hf_device_map}\n")

    return model

the full code is reference from https://github.com/ninehills/chatglm-openai-api upper code in chatglm/chatglm.py when i run command

python main.py --device=cpu or xpu

it will throw

root@faed2ef52605:/app# python main.py --device=xpu
> Load config and arguments...
Config file: config.toml
Language Model: chatglm-6b-int4
Embeddings Model:
Device: xpu
GPUs: 1
Port: 8080
Tunneling:
Config:
{'models': {'llm': {'chatglm-6b': {'type': 'chatglm', 'path': 'THUDM/chatglm-6b'}, 'chatglm-6b-int8': {'type': 'chatglm', 'path': 'THUDM/chatglm-6b-int8'}, 'chatglm-6b-int4': {'type': 'chatglm', 'path': '/app/model/chatglm2-6b-int4'}, 'chatglm2-6b': {'type': 'chatglm', 'path': 'THUDM/chatglm2-6b'}, 'chatglm2-6b-int8': {'type': 'chatglm', 'path': 'THUDM/chatglm2-6b-int8'}, 'chatglm2-6b-int4': {'type': 'chatglm', 'path': 'THUDM/chatglm2-6b-int4'}, 'phoenix-inst-chat-7b': {'type': 'phoenix', 'path': 'FreedomIntelligence/phoenix-inst-chat-7b'}, 'phoenix-inst-chat-7b-int4': {'type': 'phoenix', 'path': 'FreedomIntelligence/phoenix-inst-chat-7b-int4'}}, 'embeddings': {'text2vec-large-chinese': {'type': 'default', 'path': 'GanymedeNil/text2vec-large-chinese'}}}, 'auth': {'tokens': ['token1']}}
> Start LLM model chatglm-6b-int4
>> Use chatglm llm model /app/model/chatglm2-6b-int4
Illegal instruction (core dumped)

Versions

when i ran collect_env.py also

root@faed2ef52605:/app# python env.py
Illegal instruction (core dumped)

maybe is my environment is not adaptable for ipex,the collect env also crushed

joebnb commented 11 months ago

i pull intel/intel-extension-for-pytorch and run env.py

# in ipex container
root@4ab1e72276c4:/app# python env.py
/usr/local/lib/python3.10/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
ERROR! Intel® Extension for PyTorch* only works on machines with         instruction sets equal or newer than AVX2, which are not detected on the             current machine.
# in python container and ipex container print same results
root@faed2ef52605:/app# lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01)
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:02.0 VGA compatible controller: Device 1234:1111 (rev 02)
00:03.0 Unclassified device [00ff]: Red Hat, Inc. Virtio memory balloon
00:05.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
00:10.0 VGA compatible controller: Intel Corporation Alder Lake-N [UHD Graphics]
00:12.0 Ethernet controller: Red Hat, Inc. Virtio network device
00:1e.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
00:1f.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
01:01.0 SCSI storage controller: Red Hat, Inc. Virtio SCSI

problem solved of instruction sets equal or newer than AVX2 it's need set correct CPU model on pve hardware, i set host to pass original host cpu model

joebnb commented 11 months ago

finally server was up but still with error

INFO:     192.168.31.32:53875 - "GET /v1/models HTTP/1.1" 200 OK
INFO:     192.168.31.32:53895 - "GET /v1/models HTTP/1.1" 200 OK
INFO:     192.168.31.32:53900 - "GET /v1/models HTTP/1.1" 200 OK
question = 你好, history = [("You are ChatGPT, a large language model trained by OpenAI. Follow the user's instructions carefully. Respond using markdown.", 'OK')]
INFO:     192.168.31.32:53918 - "POST /v1/chat/completions HTTP/1.1" 200 OK
/root/.cache/huggingface/modules/transformers_modules/chatglm2-6b-int4/modeling_chatglm.py:408: UserWarning: IPEX XPU dedicated fusion passes are enabled in ScriptGraph non profiling execution mode. Please enable profiling execution mode to retrieve device guard.
 (Triggered internally at /build/intel-pytorch-extension/csrc/gpu/jit/fusion_pass.cpp:826.)
  query_layer = apply_rotary_pos_emb(query_layer, rotary_pos_emb)
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/h11_impl.py", line 408, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1115, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 83, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
    raise e
  File "/usr/local/lib/python3.10/dist-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 69, in app
    await response(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/sse_starlette/sse.py", line 233, in __call__
    async with anyio.create_task_group() as task_group:
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 597, in __aexit__
    raise exceptions[0]
  File "/usr/local/lib/python3.10/dist-packages/sse_starlette/sse.py", line 236, in wrap
    await func()
  File "/usr/local/lib/python3.10/dist-packages/sse_starlette/sse.py", line 221, in stream_response
    async for data in self.body_iterator:
  File "/app/app.py", line 285, in eval_llm
    for response in context.model.do_chat_stream(
  File "/app/chatglm/chat.py", line 21, in do_chat_stream
    for response, _ in model.stream_chat(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 35, in generator_context
    response = gen.send(None)
  File "/root/.cache/huggingface/modules/transformers_modules/chatglm2-6b-int4/modeling_chatglm.py", line 1063, in stream_chat
    for outputs in self.stream_generate(**inputs, past_key_values=past_key_values,
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 35, in generator_context
    response = gen.send(None)
  File "/root/.cache/huggingface/modules/transformers_modules/chatglm2-6b-int4/modeling_chatglm.py", line 1149, in stream_generate
    outputs = self(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/chatglm2-6b-int4/modeling_chatglm.py", line 937, in forward
    transformer_outputs = self.transformer(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/chatglm2-6b-int4/modeling_chatglm.py", line 830, in forward
    hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/chatglm2-6b-int4/modeling_chatglm.py", line 640, in forward
    layer_ret = layer(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/chatglm2-6b-int4/modeling_chatglm.py", line 544, in forward
    attention_output, kv_cache = self.self_attention(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/chatglm2-6b-int4/modeling_chatglm.py", line 408, in forward
    query_layer = apply_rotary_pos_emb(query_layer, rotary_pos_emb)
NotImplementedError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: Could not run 'torch_ipex::mul_add' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'torch_ipex::mul_add' is only available for these backends: [XPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, AutogradMeta, Tracer, AutocastCPU, AutocastXPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PythonDispatcher].

XPU: registered at /build/intel-pytorch-extension/csrc/gpu/aten/operators/TripleOps.cpp:510 [kernel]
BackendSelect: fallthrough registered at /build/pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at /build/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:144 [backend fallback]
FuncTorchDynamicLayerBackMode: registered at /build/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:491 [backend fallback]
Functionalize: registered at /build/pytorch/aten/src/ATen/FunctionalizeFallbackKernel.cpp:280 [backend fallback]
Named: registered at /build/pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: registered at /build/pytorch/aten/src/ATen/ConjugateFallback.cpp:17 [backend fallback]
Negative: registered at /build/pytorch/aten/src/ATen/native/NegateFallback.cpp:19 [backend fallback]
ZeroTensor: registered at /build/pytorch/aten/src/ATen/ZeroTensorFallback.cpp:86 [backend fallback]
ADInplaceOrView: fallthrough registered at /build/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:63 [backend fallback]
AutogradOther: fallthrough registered at /build/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:30 [backend fallback]
AutogradCPU: fallthrough registered at /build/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:34 [backend fallback]
AutogradCUDA: fallthrough registered at /build/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:42 [backend fallback]
AutogradXLA: fallthrough registered at /build/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:46 [backend fallback]
AutogradMPS: fallthrough registered at /build/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:54 [backend fallback]
AutogradXPU: fallthrough registered at /build/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:38 [backend fallback]
AutogradHPU: fallthrough registered at /build/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:67 [backend fallback]
AutogradLazy: fallthrough registered at /build/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:50 [backend fallback]
AutogradMeta: fallthrough registered at /build/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:58 [backend fallback]
Tracer: registered at /build/pytorch/torch/csrc/autograd/TraceTypeManual.cpp:294 [backend fallback]
AutocastCPU: fallthrough registered at /build/pytorch/aten/src/ATen/autocast_mode.cpp:487 [backend fallback]
AutocastXPU: registered at /build/intel-pytorch-extension/csrc/gpu/aten/operators/TripleOps.cpp:510 [kernel]
AutocastCUDA: fallthrough registered at /build/pytorch/aten/src/ATen/autocast_mode.cpp:354 [backend fallback]
FuncTorchBatched: registered at /build/pytorch/aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:815 [backend fallback]
FuncTorchVmapMode: fallthrough registered at /build/pytorch/aten/src/ATen/functorch/VmapModeRegistrations.cpp:28 [backend fallback]
Batched: registered at /build/pytorch/aten/src/ATen/LegacyBatchingRegistrations.cpp:1073 [backend fallback]
VmapMode: fallthrough registered at /build/pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
FuncTorchGradWrapper: registered at /build/pytorch/aten/src/ATen/functorch/TensorWrapper.cpp:210 [backend fallback]
PythonTLSSnapshot: registered at /build/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:152 [backend fallback]
FuncTorchDynamicLayerFrontMode: registered at /build/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:487 [backend fallback]
PythonDispatcher: registered at /build/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:148 [backend fallback]

INFO:     192.168.31.32:53940 - "GET /v1/models HTTP/1.1" 200 OK

i try with command python main.py --device=xpu and python main.py --device=cpu both

kta-intel commented 11 months ago

pip install intel_extension_for_pytorch will install the cpu package. If you want to use xpu, you please follow the xpu installation instructions. It's worth noting that currently only the data center GPUs and Arc and the associated drivers have been validated. I'm not sure necessarily how it will behave with an integrated GPU. Also, you will need to be sure that the oneAPI base toolkit is installed and activated.