intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.51k stars 1.24k forks source link

Qwen-7B TypeError: qwen_attention_forward() got an unexpected keyword argument 'registered_causal_mask' #11103

Open juan-OY opened 4 months ago

juan-OY commented 4 months ago

Model is based on Qwen 1.0, it once worked, but with latest ipex-llm ipex-llm 2.1.0b20240521 Follow below guide to install. https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen#1-install

It reports issue with an unexpected keyword argument 'registered_causal_mask', the same code worked Qwen-7B-Chat python generate_ipexllm.py C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source? warn( 2024-05-22 21:34:06,278 - INFO - intel_extension_for_pytorch auto imported 2024-05-22 21:34:06,330 - WARNING - Warning: please make sure that you are using the latest codes and checkpoints, especially if you used Qwen-7B before 09.25.2023.请使用最新模型和代码,尤其如果你在9月25日前已经开始使用Qwen-7B,千万注意不要使用错误代码和模型。 2024-05-22 21:34:06,330 - WARNING - Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary 2024-05-22 21:34:06,330 - WARNING - Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm 2024-05-22 21:34:06,331 - WARNING - Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention 2024-05-22 21:34:06,720 - INFO - Converting the current model to sym_int4 format...... Traceback (most recent call last): File "C:\multi-modality\cvte_qwen\ultra_test_code_and_data\benchmark_test2intel\generate_ipexllm.py", line 71, in output = model.generate(input_ids, File "C:\Users\Intel/.cache\huggingface\modules\transformers_modules\us_qwen_0435_r2-int4\modeling_qwen.py", line 1330, in generate return super().generate( File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\transformers\generation\utils.py", line 1588, in generate return self.sample( File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\transformers\generation\utils.py", line 2642, in sample outputs = self( File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "C:\Users\Intel/.cache\huggingface\modules\transformers_modules\us_qwen_0435_r2-int4\modeling_qwen.py", line 1120, in forward transformer_outputs = self.transformer( File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\ipex_llm\transformers\models\qwen.py", line 369, in qwen_model_forward outputs = block( File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "C:\Users\Intel/.cache\huggingface\modules\transformers_modules\us_qwen_0435_r2-int4\modeling_qwen.py", line 653, in forward attn_outputs = self.attn( File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, kwargs) TypeError: qwen_attention_forward() got an unexpected keyword argument 'registered_causal_mask'

leonardozcm commented 4 months ago

Sorry that I can not reproduce this issue on qwen-7b-chat

(changmin-llm) arda@arda-arc13:~/changmin/llm.cpp$ pip install ipex-llm==2.1.0b20240521
Collecting ipex-llm==2.1.0b20240521
  Using cached ipex_llm-2.1.0b20240521-py3-none-manylinux2010_x86_64.whl.metadata (5.0 kB)
Using cached ipex_llm-2.1.0b20240521-py3-none-manylinux2010_x86_64.whl (13.8 MB)
Installing collected packages: ipex-llm
  Attempting uninstall: ipex-llm
    Found existing installation: ipex-llm 2.1.0b20240522
    Uninstalling ipex-llm-2.1.0b20240522:
      Successfully uninstalled ipex-llm-2.1.0b20240522
Successfully installed ipex-llm-2.1.0b20240521
(changmin-llm) arda@arda-arc13:~/changmin/llm.cpp$ python qwen.py 
/home/arda/miniforge3/envs/changmin-llm/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
2024-05-23 09:36:35,438 - INFO - intel_extension_for_pytorch auto imported
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 22.53it/s]
2024-05-23 09:36:35,965 - INFO - Converting the current model to sym_int4 format......
-------------------- Prompt --------------------

<|im_start|>system
You are a helpful assistant.
<|im_end|>
<|im_start|>user
AI是什么?
<|im_end|>
<|im_start|>assistant

-------------------- Output --------------------

system
You are a helpful assistant.

user
AI是什么?

assistant
AI是人工智能的缩写,它是指模拟人类智能的技术和方法。它是研究如何让计算机像人一样思考、学习、理解和处理信息的
leonardozcm commented 4 months ago

fixed in https://github.com/intel-analytics/ipex-llm/pull/11110