intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.51k stars 1.24k forks source link

minicpm-v-2-6 can't run on A770 Ubuntu #11756

Closed biyuehuang closed 1 month ago

biyuehuang commented 1 month ago

A770 Ubuntu22.04

cpm.py

import torch
from PIL import Image
from ipex_llm.transformers import AutoModel
#from transformers import AutoModel
from transformers import AutoTokenizer
import time

model_path = "./models/MiniCPM-V-2_6"
model = AutoModel.from_pretrained(model_path, trust_remote_code=True, load_in_low_bit="asym_int4") #"fp8",
                               #   optimize_model=True, modules_to_not_convert=["vpm", "resampler"])

model = model.eval()
#model = model.float()

model = model.half() # /transformers/generation/utils.py", line 2415, in _sample
                    #   next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
                    #  RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
#model = model.bfloat16() #RuntimeError: unsupported dtype, only fp32 and fp16 are supported
model = model.to('xpu')

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

def run_minicpm(image_path,question):
    image = Image.open(image_path).convert('RGB')    
    msgs = [{'role': 'user', 'content': question}]
    torch.xpu.synchronize()
    timeStart = time.time()

    res = model.chat(
        image=image,
        msgs=msgs,
        tokenizer=tokenizer,
        sampling=True,
        stream=True,
        temperature=0.7
    )

    timeFirstRecord = False

    generated_text = ""
    for new_text in res:
        if timeFirstRecord == False:
            torch.xpu.synchronize()
            timeFirst = time.time() - timeStart
            timeFirstRecord = True
        generated_text += new_text
        print(new_text, flush=True, end='')

    torch.xpu.synchronize()
    timeCost = time.time() - timeStart
    token_count_input = len(tokenizer.tokenize(question))
    token_count_output = len(tokenizer.tokenize(generated_text))

    ms_first_token = timeFirst# * 1000
    ms_rest_token = (timeCost - timeFirst) / (token_count_output - 1+1e-8) * 1000
    print("\ninput: ", question)
    print("output: ", generated_text)
    print("token count input: ", token_count_input)
    print("token count output: ", token_count_output)
    print("time cost(s): ", timeCost)
    print("First token latency(s): ", ms_first_token)
    print("After token latency(ms/token)", ms_rest_token)
    print("output token/s: ", token_count_output/timeCost)
    print("output char/s",len(generated_text)/timeCost)
    print("******** image path = ",image_path)
    print("_______________")
   # print(res)

#run_minicpm('./test_image/guo.png','What are in the image?')
run_minicpm('./mini_0730/cat.JPG','这是什么品种的猫')
(ipex3.10) test@root1-Z690-AORUS-ELITE-DDR4:~/cpm$ pip list
Package                       Version
----------------------------- ------------------
accelerate                    0.31.0
aiofiles                      23.2.1
aiohappyeyeballs              2.3.5
aiohttp                       3.10.1
aiolimiter                    1.1.0
aiosignal                     1.3.1
alabaster                     0.7.13
albucore                      0.0.9
albumentations                1.4.8
annotated-types               0.7.0
antlr4-python3-runtime        4.9.3
anyio                         4.4.0
anytree                       2.12.1
archspec                      0.2.3
asttokens                     2.4.1
async-timeout                 4.0.3
attrs                         23.2.0
autograd                      1.6.2
azure-common                  1.1.28
azure-core                    1.30.2
azure-identity                1.17.1
azure-search-documents        11.5.0
azure-storage-blob            12.21.0
Babel                         2.12.1
beartype                      0.18.5
bigdl-core-xe-21              2.5.0b20240807
bigdl-core-xe-addons-21       2.5.0b20240807
bigdl-core-xe-batch-21        2.5.0b20240807
boltons                       23.1.1
Brotli                        1.1.0
cachetools                    5.4.0
certifi                       2024.2.2
cffi                          1.16.0
charset-normalizer            3.3.2
click                         8.1.7
cloudpickle                   3.0.0
colorama                      0.4.6
coloredlogs                   15.0.1
contourpy                     1.2.1
cramjam                       2.8.3
cryptography                  43.0.0
cycler                        0.12.1
Cython                        3.0.10
dacite                        1.8.1
dask                          2024.7.1
dask-expr                     1.1.9
datasets                      2.20.0
datashaper                    0.0.49
decorator                     5.1.1
deprecation                   2.1.0
devtools                      0.12.2
diffusers                     0.29.0
dill                          0.3.8
diskcache                     5.6.3
distlib                       0.3.7
distro                        1.9.0
docstring_parser              0.16
docutils                      0.18.1
easydict                      1.13
einops                        0.8.0
environs                      11.0.0
exceptiongroup                1.2.2
executing                     2.0.1
fairscale                     0.4.13
fastapi                       0.112.0
fastparquet                   2024.5.0
ffmpy                         0.4.0
filelock                      3.14.0
fire                          0.6.0
flatbuffers                   24.3.25
fonttools                     4.53.0
frozenlist                    1.4.1
fsspec                        2024.5.0
ftfy                          6.2.0
future                        1.0.0
gensim                        4.3.3
gradio                        4.40.0
gradio_client                 1.2.0
graspologic                   3.4.1
graspologic-native            1.2.1
h11                           0.14.0
h5py                          3.11.0
httpcore                      1.0.5
httpx                         0.27.0
huggingface-hub               0.23.3
humanfriendly                 10.0
hyppo                         0.4.0
idna                          3.6
imageio                       2.34.1
imagesize                     1.4.1
importlib_metadata            7.1.0
importlib_resources           6.4.0
insightface                   0.7.3
intel-cmplr-lib-ur            2024.2.1
intel-extension-for-pytorch   2.1.10+xpu
intel-openmp                  2024.2.1
ipex-llm                      2.1.0b20240807
isodate                       0.6.1
Jinja2                        3.1.4
joblib                        1.4.2
jsonpatch                     1.33
jsonpointer                   2.4
jsonschema                    4.23.0
jsonschema-specifications     2023.12.1
kiwisolver                    1.4.5
lancedb                       0.9.0
lazy_loader                   0.4
linkify-it-py                 2.0.3
llvmlite                      0.43.0
locket                        1.0.0
markdown-it-py                3.0.0
MarkupSafe                    2.1.5
marshmallow                   3.21.3
matplotlib                    3.9.0
mdit-py-plugins               0.4.1
mdurl                         0.1.2
meson                         1.2.0
mpmath                        1.3.0
msal                          1.30.0
msal-extensions               1.2.0
multidict                     6.0.5
multiprocess                  0.70.16
nest-asyncio                  1.6.0
networkx                      3.3
nltk                          3.8.1
numba                         0.60.0
numpy                         1.26.4
ollama                        0.3.0
omegaconf                     2.3.0
onnx                          1.16.1
onnxruntime                   1.18.0
open_clip_torch               2.26.1
openai                        1.37.1
opencv-python                 4.10.0.82
opencv-python-headless        4.10.0.82
orjson                        3.10.6
overrides                     7.7.0
packaging                     24.0
pandas                        2.2.2
partd                         1.4.2
patsy                         0.5.6
pillow                        10.3.0
pip                           24.0
platformdirs                  4.2.0
plotly                        5.23.0
pluggy                        1.4.0
portalocker                   2.10.1
POT                           0.9.4
prettytable                   3.10.0
protobuf                      5.27.1
psutil                        5.9.8
py                            1.11.0
py-cpuinfo                    9.0.0
pyaml-env                     1.2.1
pyarrow                       15.0.0
pyarrow-hotfix                0.6
pycosat                       0.6.6
pycparser                     2.21
pydantic                      2.7.3
pydantic_core                 2.18.4
pydub                         0.25.1
Pygments                      2.18.0
PyJWT                         2.8.0
pylance                       0.13.0
pynndescent                   0.5.13
pyparsing                     3.1.2
pyreadline3                   3.4.1
PySocks                       1.7.1
python-dateutil               2.9.0.post0
python-dotenv                 1.0.1
python-multipart              0.0.9
pytz                          2024.1
PyYAML                        6.0.1
ratelimiter                   1.2.0.post0
referencing                   0.35.1
regex                         2024.5.15
requests                      2.32.3
retry                         0.9.2
rich                          13.7.1
rpds-py                       0.19.1
ruamel.yaml                   0.18.6
ruamel.yaml.clib              0.2.8
ruff                          0.5.6
safetensors                   0.4.3
scikit-image                  0.23.2
scikit-learn                  1.5.0
scipy                         1.12.0
seaborn                       0.13.2
semantic-version              2.10.0
sentencepiece                 0.2.0
setuptools                    69.2.0
shapely                       2.0.5
shellingham                   1.5.4
shtab                         1.7.1
six                           1.16.0
smart-open                    7.0.4
sniffio                       1.3.1
snowballstemmer               2.2.0
Sphinx                        6.2.1
sphinx-rtd-theme              1.2.2
sphinxcontrib-applehelp       1.0.4
sphinxcontrib-devhelp         1.0.2
sphinxcontrib-htmlhelp        2.0.1
sphinxcontrib-jquery          4.1
sphinxcontrib-jsmath          1.0.1
sphinxcontrib-qthelp          1.0.3
sphinxcontrib-serializinghtml 1.1.5
starlette                     0.37.2
statsmodels                   0.14.2
swifter                       1.4.0
sympy                         1.12.1
tabulate                      0.9.0
tenacity                      8.5.0
termcolor                     2.4.0
textual                       0.70.0
threadpoolctl                 3.5.0
tifffile                      2024.5.22
tiktoken                      0.7.0
timm                          1.0.7
tokenizers                    0.19.1
tomli                         2.0.1
tomlkit                       0.12.0
toolz                         0.12.1
torch                         2.1.0a0+cxx11.abi
torchvision                   0.16.0a0+cxx11.abi
tqdm                          4.66.5
transformers                  4.40.0
trl                           0.9.6
truststore                    0.8.0
typer                         0.12.3
typing_extensions             4.12.2
tyro                          0.8.5
tzdata                        2024.1
uc-micro-py                   1.0.3
umap-learn                    0.5.6
urllib3                       2.2.1
uvicorn                       0.30.5
virtualenv                    20.24.2
wcwidth                       0.2.13
websockets                    12.0
wheel                         0.43.0
win-inet-pton                 1.1.0
wrapt                         1.16.0
xxhash                        3.4.1
yarl                          1.9.4
zipp                          3.19.2
zstandard                     0.22.0

[notice] A new release of pip is available: 24.0 -> 24.2
[notice] To update, run: pip install --upgrade pip
/home/test/miniforge3/envs/ipex3.10/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
/home/test/miniforge3/envs/ipex3.10/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
2024-08-10 22:37:59,403 - INFO - intel_extension_for_pytorch auto imported
2024-08-10 22:37:59,479 - INFO - vision_config is None, using default vision config
Loading checkpoint shards: 100%|███████████████████████████████████████████████████| 4/4 [00:00<00:00, 13.40it/s]
2024-08-10 22:38:00,451 - INFO - Converting the current model to asym_int4 format......
/home/test/miniforge3/envs/ipex3.10/lib/python3.10/site-packages/torch/nn/init.py:412: UserWarning: Initializing zero-element tensors is a no-op
  warnings.warn("Initializing zero-element tensors is a no-op")
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
哈哈哈哈The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.
Exception in thread Thread-4 (generate):
Traceback (most recent call last):
  File "/home/test/miniforge3/envs/ipex3.10/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/home/test/miniforge3/envs/ipex3.10/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/home/test/miniforge3/envs/ipex3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/test/miniforge3/envs/ipex3.10/lib/python3.10/site-packages/ipex_llm/transformers/lookup.py", line 88, in generate
    return original_generate(self,
  File "/home/test/miniforge3/envs/ipex3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/test/miniforge3/envs/ipex3.10/lib/python3.10/site-packages/ipex_llm/transformers/speculative.py", line 109, in generate
    return original_generate(self,
  File "/home/test/miniforge3/envs/ipex3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/test/miniforge3/envs/ipex3.10/lib/python3.10/site-packages/ipex_llm/transformers/pipeline_parallel.py", line 281, in generate
    return original_generate(self,
  File "/home/test/miniforge3/envs/ipex3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/test/miniforge3/envs/ipex3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 1622, in generate
    result = self._sample(
  File "/home/test/miniforge3/envs/ipex3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 2829, in _sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
qiuxin2012 commented 1 month ago

similar issue: https://github.com/intel-analytics/ipex-llm/issues/11731

jason-dai commented 1 month ago

similar issue: #11731

I think it is already supported; we need to add an example

MeouSker77 commented 1 month ago

solved in offline discussion

biyuehuang commented 1 month ago

verified OK on other A770 Ubuntu device

source /opt/intel/oneapi/2024.0/oneapi-vars.sh
conda create -n ipex-llm python=3.11
conda activate ipex-llm
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/

pip install timm transformers==4.41.0 trl gradio==4.21.0