Inference error during enabling Llama-3.1-8B model on MTL, transformers==4.43.2

lei-sun-intel commented 1 month ago

HW Intel MTL OS Ubuntu 22.04 Python 3.9.19 transformers 4.43.0 intel-extension-for-pytorch 2.1.20+git0e2bee2 torch 2.1.0.post0+cxx11.abi torchvision 0.16.0+fbb4cc5

Try to re-use the following script to do inference https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/GPU/HuggingFace/LLM/llama3/generate.py

intel@intel-Meteor-Lake-Client-Platform:$ python generate_llama31.py 2024-07-25 14:03:42,349 - INFO - intel_extension_for_pytorch auto imported Traceback (most recent call last): File "/home/intel/.cache_dev_zone/notebooks/aigc_apps/generate_llama31.py", line 60, in model = AutoModelForCausalLM.from_pretrained(model_path, File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/unittest/mock.py", line 1336, in patched return func(*newargs, newkeywargs) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/ipex_llm/transformers/model.py", line 373, in from_pretrained model = cls.load_convert(q_k, optimize_model, *args, *kwargs) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/ipex_llm/transformers/model.py", line 508, in load_convert model = cls.HF_Model.from_pretrained(args, kwargs) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained return model_class.from_pretrained( File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/transformers/modeling_utils.py", line 3775, in from_pretrained model = cls(config, *model_args, **model_kwargs) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 1066, in init self.model = LlamaModel(config) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 845, in init [LlamaDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)] File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 845, in [LlamaDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)] File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 632, in init self.self_attn = LLAMA_ATTENTION_CLASSES[config._attn_implementation](config=config, layer_idx=layer_idx) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 306, in init self.rotary_emb = LlamaRotaryEmbedding(config=self.config) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 110, in init self.rope_type = config.rope_scaling.get("rope_type", config.rope_scaling["type"]) KeyError: 'type'

ElliottDyson commented 1 month ago

I had a more detailed error relating to the same issue of version compat:

ValueError: `rope_scaling` must be a dictionary with with two fields, `type` and `factor`, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}

lzivan commented 1 month ago

Hi, we are trying to reproduce your issue.

lzivan commented 1 month ago

Hi @ElliottDyson @lei-sun-intel ,

Regarding this issue, please upgrade your transformers. You may also be asked to install trl.

pip install --upgrade transformers
pip install trl           #if needed

Here is my pip list:

Package                     Version
--------------------------- ------------------
accelerate                  0.23.0
aiohttp                     3.9.5
aiosignal                   1.3.1
annotated-types             0.7.0
async-timeout               4.0.3
attrs                       23.2.0
bigdl-core-xe-21            2.5.0b20240725
bigdl-core-xe-addons-21     2.5.0b20240725
bigdl-core-xe-batch-21      2.5.0b20240725
certifi                     2024.7.4
charset-normalizer          3.3.2
datasets                    2.20.0
dill                        0.3.8
docstring_parser            0.16
eval_type_backport          0.2.0
filelock                    3.15.4
frozenlist                  1.4.1
fsspec                      2024.5.0
huggingface-hub             0.24.2
idna                        3.7
intel-cmplr-lib-ur          2024.2.0
intel-extension-for-pytorch 2.1.10+xpu
intel-openmp                2024.2.0
ipex-llm                    2.1.0b20240725
Jinja2                      3.1.4
markdown-it-py              3.0.0
MarkupSafe                  2.1.5
mdurl                       0.1.2
mpmath                      1.3.0
multidict                   6.0.5
multiprocess                0.70.16
networkx                    3.2.1
numpy                       1.26.4
packaging                   24.1
pandas                      2.2.2
pillow                      10.4.0
pip                         24.0
protobuf                    5.28.0rc1
psutil                      6.0.0
py-cpuinfo                  9.0.0
pyarrow                     17.0.0
pyarrow-hotfix              0.6
pydantic                    2.8.2
pydantic_core               2.20.1
Pygments                    2.18.0
python-dateutil             2.9.0.post0
pytz                        2024.1
PyYAML                      6.0.2rc1
regex                       2024.7.24
requests                    2.32.3
rich                        13.7.1
safetensors                 0.4.3
sentencepiece               0.2.0
setuptools                  69.5.1
shtab                       1.7.1
six                         1.16.0
sympy                       1.13.1
tabulate                    0.9.0
tokenizers                  0.19.1
torch                       2.1.0a0+cxx11.abi
torchvision                 0.16.0a0+cxx11.abi
tqdm                        4.66.4
transformers                4.43.2
trl                         0.9.6
typing_extensions           4.12.2
tyro                        0.8.5
tzdata                      2024.1
urllib3                     2.2.2
wheel                       0.43.0
xxhash                      3.4.1
yarl                        1.9.4

Output:

-------------------- Prompt --------------------
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

What is AI?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

-------------------- Output (skip_special_tokens=False) --------------------
<|begin_of_text|><|begin_of_text|><|start_header_id|>user<|end_header_id|>

What is AI?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Artificial intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think and learn like humans. The term can also be applied to

lei-sun-intel commented 1 month ago

I update to transformer 4.32.2 by pip install transformers==4.43.2. And re-run the inference script, I got the difference error msg as follows. I have checked my environment. transformers==4.43.2, trl==0.9.6 is the same as yours. While intel-extension-for-pytorch 2.1.20+git0e2bee2 is different. I will continue to diff the two envs. By the way, any hint on which pkg is the root cause?

(notebook-zone) intel@intel-Meteor-Lake-Client-Platform:~/.cache_dev_zone/notebooks/aigc_apps$ python _generate_llama31.py The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. 2024-07-29 10:28:10,653 - INFO - intel_extension_for_pytorch auto imported Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 10.62it/s] 2024-07-29 10:28:11,378 - INFO - Converting the current model to sym_int4 format...... The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results. Setting pad_token_id to eos_token_id:128009 for open-end generation. The attention mask is not set and cannot be inferred from input because pad token is same as eos token.As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results. Traceback (most recent call last): File "/home/intel/.cache_dev_zone/notebooks/aigc_apps/_generate_llama31.py", line 81, in output = model.generate(input_ids, File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/ipex_llm/transformers/lookup.py", line 88, in generate return original_generate(self, File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/ipex_llm/transformers/speculative.py", line 109, in generate return original_generate(self, File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/transformers/generation/utils.py", line 1989, in generate result = self._sample( File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/transformers/generation/utils.py", line 2932, in _sample outputs = self(model_inputs, return_dict=True) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 1141, in forward outputs = self.model( File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, **kwargs) File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/ipex_llm/transformers/models/llama.py", line 155, in llama_model_forward_4_38 return llama_model_forward_4_38_internal( File "/home/intel/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages/ipex_llm/transformers/models/llama.py", line 1895, in llama_model_forward_4_38_internal causal_mask = self._update_causal_mask(attention_mask, inputs_embeds) TypeError: _update_causal_mask() missing 3 required positional arguments: 'cache_position', 'past_key_values', and 'output_attentions'

lei-sun-intel commented 1 month ago

The root cause is the version of ipex-llm. FYI, after pip install ipex-llm==2.1.0b2, it works with transformers==4.43.2. But my previous version of ipex-llm is 2.1.0b20240610, it DOES NOT work.

intel-analytics / ipex-llm

Inference error during enabling Llama-3.1-8B model on MTL, transformers==4.43.2 #11656

Output: