intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.56k stars 1.25k forks source link

[pvc 1550] got blank output for llama2 70B on pvc with int4 #9484

Open ZhaoqiongZ opened 10 months ago

ZhaoqiongZ commented 10 months ago

the script I use is https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama2/generate.py with model Llama-2-70b-hf , the output sometimes is empty.

Here are some output examples: output1

(bigdl_xpu_py39) sdp@a4bf0192682f:~/zhaoqion$ python generate.py --repo-id-or-model-path Llama-2-70b-hf
/home/sdp/miniconda3/envs/bigdl_xpu_py39/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 26.47it/s]
2023-11-16 20:56:44,972 - bigdl.llm.transformers.utils - INFO - Converting the current model to sym_int4 format......
Inference time: 2.3499696254730225 s
-------------------- Prompt --------------------
<s>[INST] <<SYS>>

<</SYS>>

What is AI? [/INST]
-------------------- Output --------------------
[INST] <<SYS>>

<</SYS>>

What is AI? [/INST]

<[INST]>

<</INST>

What is AI? [/INST]

<[INST]>

output2

(bigdl_xpu_py39) sdp@a4bf0192682f:~/zhaoqion$ python generate.py --repo-id-or-model-path Llama-2-70b-hf
/home/sdp/miniconda3/envs/bigdl_xpu_py39/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
/home/sdp/miniconda3/envs/bigdl_xpu_py39/lib/python3.9/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/sdp/miniconda3/envs/bigdl_xpu_py39/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 24.68it/s]
2023-11-16 21:07:21,979 - bigdl.llm.transformers.utils - INFO - Converting the current model to sym_int4 format......
Inference time: 2.3450160026550293 s
-------------------- Prompt --------------------
<s>[INST] <<SYS>>

<</SYS>>

What is AI? [/INST]
-------------------- Output --------------------
[INST] <<SYS>>

<</SYS>>

What is AI? [/INST]

AI is the next step in evolution. AI is the next step in our evolution. AI is the next step in our evolution. AI

output3

(bigdl_xpu_py39) sdp@a4bf0192682f:~/zhaoqion$ python generate.py --repo-id-or-model-path Llama-2-70b-hf
/home/sdp/miniconda3/envs/bigdl_xpu_py39/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
/home/sdp/miniconda3/envs/bigdl_xpu_py39/lib/python3.9/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/sdp/miniconda3/envs/bigdl_xpu_py39/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 23.88it/s]
2023-11-16 21:10:10,950 - bigdl.llm.transformers.utils - INFO - Converting the current model to sym_int4 format......
Inference time: 2.5589447021484375 s
-------------------- Prompt --------------------
<s>[INST] <<SYS>>

<</SYS>>

What is AI? [/INST]
-------------------- Output --------------------
[INST] <<SYS>>

<</SYS>>

What is AI? [/INST]

<[INST]

>

>

>

>

>

>

>

>
ZhaoqiongZ commented 10 months ago

same result also got on Arc770 with LLama2 7B

(zzq_py39) a770@RPLP-A770:~/zhaoqion/zhaoqion$ python generate.py  --repo-id-or-model-path models--meta-llama--Llama-2-7b-hf/snapshots/6fdf2e60f86ff2481f2241aaee459f85b5b0bbb9/
/home/a770/miniconda3/envs/zzq_py39/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
Loading checkpoint shards: 100%|██████████████████████████████████████| 2/2 [00:00<00:00, 26.87it/s]
2023-11-17 14:34:15,348 - bigdl.llm.transformers.utils - INFO - Converting the current model to sym_int4 format......
Inference time: 0.612342119216919 s
-------------------- Prompt --------------------
<s>[INST] <<SYS>>

<</SYS>>

What is AI? [/INST]
-------------------- Output --------------------
[INST] <<SYS>>

<</SYS>>

What is AI? [/INST]

[/INST]

[/INST]

[/INST]

[/INST]

[/INST]
hkvision commented 10 months ago

@chtanch Please check if you can reproduce this issue.

chtanch commented 10 months ago

I reproduced the issue for Arc770 with Llama-2-7b-hf.

For this script, please use 'chat' versions, which are fine-tuned with [INST] and \<\> symbols. E.g, use Llama-2-7b-chat-hf, instead of Llama-2-7b-hf. Llama-2-7b-chat-hf works correctly in my tests.

ZhaoqiongZ commented 10 months ago

Hi @chtanch , thanks for your advice! Will Llama-2-7b-hf works well with other script ?

hkvision commented 10 months ago

Since the prompt templates are for chat models only, the base model seems to have no prompt structure. Can you try removing the prompt to see the result?

chtanch commented 10 months ago

Hi @chtanch , thanks for your advice! Will Llama-2-7b-hf works well with other script ?

You can try scripts/code that do not add the special tokens [INST], [/INST], \<\>, \<\> to the prompt. Alternatively, remove them from the prompt as suggested by hkvision.