Open juan-OY opened 4 months ago
Sorry that I can not reproduce this issue on qwen-7b-chat
(changmin-llm) arda@arda-arc13:~/changmin/llm.cpp$ pip install ipex-llm==2.1.0b20240521
Collecting ipex-llm==2.1.0b20240521
Using cached ipex_llm-2.1.0b20240521-py3-none-manylinux2010_x86_64.whl.metadata (5.0 kB)
Using cached ipex_llm-2.1.0b20240521-py3-none-manylinux2010_x86_64.whl (13.8 MB)
Installing collected packages: ipex-llm
Attempting uninstall: ipex-llm
Found existing installation: ipex-llm 2.1.0b20240522
Uninstalling ipex-llm-2.1.0b20240522:
Successfully uninstalled ipex-llm-2.1.0b20240522
Successfully installed ipex-llm-2.1.0b20240521
(changmin-llm) arda@arda-arc13:~/changmin/llm.cpp$ python qwen.py
/home/arda/miniforge3/envs/changmin-llm/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
warn(
2024-05-23 09:36:35,438 - INFO - intel_extension_for_pytorch auto imported
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 22.53it/s]
2024-05-23 09:36:35,965 - INFO - Converting the current model to sym_int4 format......
-------------------- Prompt --------------------
<|im_start|>system
You are a helpful assistant.
<|im_end|>
<|im_start|>user
AI是什么?
<|im_end|>
<|im_start|>assistant
-------------------- Output --------------------
system
You are a helpful assistant.
user
AI是什么?
assistant
AI是人工智能的缩写,它是指模拟人类智能的技术和方法。它是研究如何让计算机像人一样思考、学习、理解和处理信息的
Model is based on Qwen 1.0, it once worked, but with latest ipex-llm ipex-llm 2.1.0b20240521 Follow below guide to install. https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen#1-install
It reports issue with an unexpected keyword argument 'registered_causal_mask', the same code worked Qwen-7B-Chat python generate_ipexllm.py C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from
output = model.generate(input_ids,
File "C:\Users\Intel/.cache\huggingface\modules\transformers_modules\us_qwen_0435_r2-int4\modeling_qwen.py", line 1330, in generate
return super().generate(
File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, kwargs)
File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\transformers\generation\utils.py", line 1588, in generate
return self.sample(
File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\transformers\generation\utils.py", line 2642, in sample
outputs = self(
File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(args, kwargs)
File "C:\Users\Intel/.cache\huggingface\modules\transformers_modules\us_qwen_0435_r2-int4\modeling_qwen.py", line 1120, in forward
transformer_outputs = self.transformer(
File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, *kwargs)
File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\ipex_llm\transformers\models\qwen.py", line 369, in qwen_model_forward
outputs = block(
File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, kwargs)
File "C:\Users\Intel/.cache\huggingface\modules\transformers_modules\us_qwen_0435_r2-int4\modeling_qwen.py", line 653, in forward
attn_outputs = self.attn(
File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(args, kwargs)
TypeError: qwen_attention_forward() got an unexpected keyword argument 'registered_causal_mask'
torchvision.io
, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you havelibjpeg
orlibpng
installed before buildingtorchvision
from source? warn( 2024-05-22 21:34:06,278 - INFO - intel_extension_for_pytorch auto imported 2024-05-22 21:34:06,330 - WARNING - Warning: please make sure that you are using the latest codes and checkpoints, especially if you used Qwen-7B before 09.25.2023.请使用最新模型和代码,尤其如果你在9月25日前已经开始使用Qwen-7B,千万注意不要使用错误代码和模型。 2024-05-22 21:34:06,330 - WARNING - Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary 2024-05-22 21:34:06,330 - WARNING - Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm 2024-05-22 21:34:06,331 - WARNING - Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention 2024-05-22 21:34:06,720 - INFO - Converting the current model to sym_int4 format...... Traceback (most recent call last): File "C:\multi-modality\cvte_qwen\ultra_test_code_and_data\benchmark_test2intel\generate_ipexllm.py", line 71, in