intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.47k stars 1.24k forks source link

Inference error during enabling Llama-3.1-8B model on MTL, transformers==4.43.2 #11656

Closed lei-sun-intel closed 1 month ago

lei-sun-intel commented 1 month ago

HW Intel MTL OS Ubuntu 22.04 Python 3.9.19 transformers 4.43.0 intel-extension-for-pytorch 2.1.20+git0e2bee2 torch 2.1.0.post0+cxx11.abi torchvision 0.16.0+fbb4cc5

Try to re-use the following script to do inference https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/GPU/HuggingFace/LLM/llama3/generate.py

intel@intel-Meteor-Lake-Client-Platform:$ python generate_llama31.py 2024-07-25 14:03:42,349 - INFO - intel_extension_for_pytorch auto imported Traceback (most recent call last): File "/home/intel/.cache_dev_zone/notebooks/aigc_apps/generate_llama31.py", line 60, in model = AutoModelForCausalLM.from_pretrained(model_path, File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/unittest/mock.py", line 1336, in patched return func(*newargs, newkeywargs) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/ipex_llm/transformers/model.py", line 373, in from_pretrained model = cls.load_convert(q_k, optimize_model, *args, *kwargs) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/ipex_llm/transformers/model.py", line 508, in load_convert model = cls.HF_Model.from_pretrained(args, kwargs) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained return model_class.from_pretrained( File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/transformers/modeling_utils.py", line 3775, in from_pretrained model = cls(config, *model_args, **model_kwargs) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 1066, in init self.model = LlamaModel(config) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 845, in init [LlamaDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)] File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 845, in [LlamaDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)] File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 632, in init self.self_attn = LLAMA_ATTENTION_CLASSES[config._attn_implementation](config=config, layer_idx=layer_idx) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 306, in init self.rotary_emb = LlamaRotaryEmbedding(config=self.config) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 110, in init self.rope_type = config.rope_scaling.get("rope_type", config.rope_scaling["type"]) KeyError: 'type'

ElliottDyson commented 1 month ago

I had a more detailed error relating to the same issue of version compat:

ValueError: `rope_scaling` must be a dictionary with with two fields, `type` and `factor`, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}
lzivan commented 1 month ago

Hi, we are trying to reproduce your issue.

lzivan commented 1 month ago

Hi @ElliottDyson @lei-sun-intel ,

Regarding this issue, please upgrade your transformers. You may also be asked to install trl.

pip install --upgrade transformers
pip install trl           #if needed

Here is my pip list:

Package                     Version
--------------------------- ------------------
accelerate                  0.23.0
aiohttp                     3.9.5
aiosignal                   1.3.1
annotated-types             0.7.0
async-timeout               4.0.3
attrs                       23.2.0
bigdl-core-xe-21            2.5.0b20240725
bigdl-core-xe-addons-21     2.5.0b20240725
bigdl-core-xe-batch-21      2.5.0b20240725
certifi                     2024.7.4
charset-normalizer          3.3.2
datasets                    2.20.0
dill                        0.3.8
docstring_parser            0.16
eval_type_backport          0.2.0
filelock                    3.15.4
frozenlist                  1.4.1
fsspec                      2024.5.0
huggingface-hub             0.24.2
idna                        3.7
intel-cmplr-lib-ur          2024.2.0
intel-extension-for-pytorch 2.1.10+xpu
intel-openmp                2024.2.0
ipex-llm                    2.1.0b20240725
Jinja2                      3.1.4
markdown-it-py              3.0.0
MarkupSafe                  2.1.5
mdurl                       0.1.2
mpmath                      1.3.0
multidict                   6.0.5
multiprocess                0.70.16
networkx                    3.2.1
numpy                       1.26.4
packaging                   24.1
pandas                      2.2.2
pillow                      10.4.0
pip                         24.0
protobuf                    5.28.0rc1
psutil                      6.0.0
py-cpuinfo                  9.0.0
pyarrow                     17.0.0
pyarrow-hotfix              0.6
pydantic                    2.8.2
pydantic_core               2.20.1
Pygments                    2.18.0
python-dateutil             2.9.0.post0
pytz                        2024.1
PyYAML                      6.0.2rc1
regex                       2024.7.24
requests                    2.32.3
rich                        13.7.1
safetensors                 0.4.3
sentencepiece               0.2.0
setuptools                  69.5.1
shtab                       1.7.1
six                         1.16.0
sympy                       1.13.1
tabulate                    0.9.0
tokenizers                  0.19.1
torch                       2.1.0a0+cxx11.abi
torchvision                 0.16.0a0+cxx11.abi
tqdm                        4.66.4
transformers                4.43.2
trl                         0.9.6
typing_extensions           4.12.2
tyro                        0.8.5
tzdata                      2024.1
urllib3                     2.2.2
wheel                       0.43.0
xxhash                      3.4.1
yarl                        1.9.4

Output:

-------------------- Prompt --------------------
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

What is AI?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

-------------------- Output (skip_special_tokens=False) --------------------
<|begin_of_text|><|begin_of_text|><|start_header_id|>user<|end_header_id|>

What is AI?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Artificial intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think and learn like humans. The term can also be applied to
lei-sun-intel commented 1 month ago

I update to transformer 4.32.2 by pip install transformers==4.43.2. And re-run the inference script, I got the difference error msg as follows. I have checked my environment. transformers==4.43.2, trl==0.9.6 is the same as yours. While intel-extension-for-pytorch 2.1.20+git0e2bee2 is different. I will continue to diff the two envs. By the way, any hint on which pkg is the root cause?

(notebook-zone) intel@intel-Meteor-Lake-Client-Platform:~/.cache_dev_zone/notebooks/aigc_apps$ python _generate_llama31.py The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. 2024-07-29 10:28:10,653 - INFO - intel_extension_for_pytorch auto imported Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 10.62it/s] 2024-07-29 10:28:11,378 - INFO - Converting the current model to sym_int4 format...... The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results. Setting pad_token_id to eos_token_id:128009 for open-end generation. The attention mask is not set and cannot be inferred from input because pad token is same as eos token.As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results. Traceback (most recent call last): File "/home/intel/.cache_dev_zone/notebooks/aigc_apps/_generate_llama31.py", line 81, in output = model.generate(input_ids, File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/ipex_llm/transformers/lookup.py", line 88, in generate return original_generate(self, File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/ipex_llm/transformers/speculative.py", line 109, in generate return original_generate(self, File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/transformers/generation/utils.py", line 1989, in generate result = self._sample( File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/transformers/generation/utils.py", line 2932, in _sample outputs = self(model_inputs, return_dict=True) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 1141, in forward outputs = self.model( File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, **kwargs) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/ipex_llm/transformers/models/llama.py", line 155, in llama_model_forward_4_38 return llama_model_forward_4_38_internal( File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/ipex_llm/transformers/models/llama.py", line 1895, in llama_model_forward_4_38_internal causal_mask = self._update_causal_mask(attention_mask, inputs_embeds) TypeError: _update_causal_mask() missing 3 required positional arguments: 'cache_position', 'past_key_values', and 'output_attentions'

lei-sun-intel commented 1 month ago

The root cause is the version of ipex-llm. FYI, after pip install ipex-llm==2.1.0b2, it works with transformers==4.43.2. But my previous version of ipex-llm is 2.1.0b20240610, it DOES NOT work.