intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
Apache License 2.0
6.69k stars 1.26k forks source link

Inference error "LLVM ERROR: Diag: aborted" when enabling codegeex4-all-9b on MTL, transformers==4.39.0 #11658

Closed lei-sun-intel closed 3 months ago

lei-sun-intel commented 3 months ago

OS Ubuntu 22.04 python 3.9.19 transformers 4.39.0 intel-extension-for-pytorch 2.1.20+git0e2bee2 torch 2.1.0.post0+cxx11.abi torchvision 0.16.0+fbb4cc5

  1. download CodeGeeX4 model from https://modelscope.cn/models/ZhipuAI/codegeex4-all-9b
  2. Using the following script to do inference https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/GPU/HuggingFace/LLM/codegeex2/generate.py

intel@intel-Meteor-Lake-Client-Platform:$ python generate_codegeex4.py 2024-07-25 16:19:31,476 - INFO - intel_extension_for_pytorch auto imported Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 12.86it/s] 2024-07-25 16:19:32,353 - INFO - Converting the current model to sym_int4 format...... Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Traceback (most recent call last): File "/home/intel/.cache_dev_zone/notebooks/aigc_apps/generate_codegeex4.py", line 66, in output = model.generate(input_ids, n_predict) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/ipex_llm/transformers/lookup.py", line 88, in generate return original_generate(self, File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/ipex_llm/transformers/speculative.py", line 109, in generate return original_generate(self, File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/transformers/generation/utils.py", line 1324, in generate generation_config, model_kwargs = self._prepare_generation_config(generation_config, kwargs) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/transformers/generation/utils.py", line 1220, in _prepare_generation_config model_kwargs = generation_config.update(kwargs) AttributeError: 'int' object has no attribute 'update'

hzjane commented 3 months ago

@lei-sun-intel The line output = model.generate(input_ids, n_predict) should be output = model.generate(input_ids, max_new_tokens=n_predict).

lei-sun-intel commented 3 months ago

pip install tiktoken==0.7.0 pip install transformers==4.39.0 pip install trl==0.9.6

(notebook-zone) testv023@intel-NUC14RVH-B:~/.cache_dev_zone/notebooks/aigc_apps$ python _generate_codegeex4.py 2024-07-29 23:44:50,067 - INFO - intel_extension_for_pytorch auto imported Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 10.54it/s] 2024-07-29 23:44:51,076 - INFO - Converting the current model to sym_int4 format...... Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. LLVM ERROR: Diag: aborted

LIBXSMM_VERSION: main_stable-1.17-3651 (25693763) LIBXSMM_TARGET: adl [Intel(R) Core(TM) Ultra 5 125H] Registry and code: 13 MB Command: python _generate_codegeex4.py Uptime: 21.857613 s Aborted (core dumped)

lei-sun-intel commented 3 months ago

The root cause is the version of ipex-llm. FYI, after pip install ipex-llm==2.1.0b2, it works. But my previous version of ipex-llm is 2.1.0b20240610, it DOES NOT work.