Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
Apache License 2.0
6.69k
stars
1.26k
forks
source link
Run llama2-chat-hf with tranformers 4.38.1 failed #10249
<class 'transformers.models.llama.modeling_llama.LlamaForCausalLM'>
Keyword arguments {'add_special_tokens': False} not recognized.
Keyword arguments {'add_special_tokens': False} not recognized.
/home/arda/xin/BigDL-xin/python/llm/dev/benchmark/all-in-one/../benchmark_util.py:1295: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
warnings.warn(
2024-02-27 10:27:43,904 - ERROR -
****************************Usage Error************************
Attention mask should be of size (1, 1, 33, 33), but is torch.Size([1, 1, 4096, 4096])
2024-02-27 10:27:43,904 - ERROR -
****************************Call Stack*************************
Exception in thread Thread-4:
Traceback (most recent call last):
File "/home/arda/anaconda3/envs/xin-llm/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/home/arda/anaconda3/envs/xin-llm/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/home/arda/xin/BigDL-xin/python/llm/dev/benchmark/all-in-one/run.py", line 62, in run_model_in_thread
output_ids = model.generate(input_ids, do_sample=False, max_new_tokens=out_len,
File "/home/arda/anaconda3/envs/xin-llm/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/arda/xin/BigDL-xin/python/llm/dev/benchmark/all-in-one/../benchmark_util.py", line 1563, in generate
return self.greedy_search(
File "/home/arda/xin/BigDL-xin/python/llm/dev/benchmark/all-in-one/../benchmark_util.py", line 2385, in greedy_search
outputs = self(
File "/home/arda/xin/BigDL-xin/python/llm/dev/benchmark/all-in-one/../benchmark_util.py", line 533, in __call__
return self.model(*args, **kwargs)
File "/home/arda/anaconda3/envs/xin-llm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/arda/anaconda3/envs/xin-llm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/arda/anaconda3/envs/xin-llm/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 1168, in forward
outputs = self.model(
File "/home/arda/anaconda3/envs/xin-llm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/arda/anaconda3/envs/xin-llm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/arda/anaconda3/envs/xin-llm/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 1008, in forward
layer_outputs = decoder_layer(
File "/home/arda/anaconda3/envs/xin-llm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/arda/anaconda3/envs/xin-llm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/arda/anaconda3/envs/xin-llm/lib/python3.9/site-packages/bigdl/llm/transformers/models/llama.py", line 190, in llama_decoder_forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/home/arda/anaconda3/envs/xin-llm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/arda/anaconda3/envs/xin-llm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/arda/anaconda3/envs/xin-llm/lib/python3.9/site-packages/bigdl/llm/transformers/models/llama.py", line 1047, in llama_attention_forward_4_36
attn_output, attn_weights = native_sdp(query_states, key_states, value_states,
File "/home/arda/anaconda3/envs/xin-llm/lib/python3.9/site-packages/bigdl/llm/transformers/models/llama.py", line 1090, in native_sdp
invalidInputError(False,
File "/home/arda/anaconda3/envs/xin-llm/lib/python3.9/site-packages/bigdl/llm/utils/common/log4Error.py", line 32, in invalidInputError
raise RuntimeError(errMsg)
RuntimeError: Attention mask should be of size (1, 1, 33, 33), but is torch.Size([1, 1, 4096, 4096])
Get below error: