intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
Apache License 2.0
6.65k stars 1.26k forks source link

Baichuan2-7B-Chat don't support 2048 token input on A770 #9593

Open KiwiHana opened 11 months ago

KiwiHana commented 11 months ago

test script: bigdl all-in-one/run-arc.sh use model.half().to("xpu") instead of model.to("xpu") input prompt: 2048 .txt output 1024 token

32in/32out正常,到了2048in/1024out脚本直接卡住一小时,没有输出,具体如下

all-in-one$ ./run-arc.sh

:: initializing oneAPI environment ...
   run-arc.sh: BASH_VERSION = 5.1.16(1)-release
   args: Using "$@" for setvars.sh arguments:
:: advisor -- latest
:: ccl -- latest
:: compiler -- latest
:: dal -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: dnnl -- latest
:: dpcpp-ct -- latest
:: dpl -- latest
:: ipp -- latest
:: ippcp -- latest
:: ipp -- latest
:: mkl -- latest
:: mpi -- latest
:: tbb -- latest
:: vtune -- latest
:: oneAPI environment initialized ::

2023年 12月 04日 星期一 11:31:56 CST
T01 Cap mem
T08   32in  32out
/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Fai                                                                                led to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you                                                                                 can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg`                                                                                 or `libpng` installed before building `torchvision` from source?
  warn(
2023-12-04 11:31:58,157 - WARNING - Xformers is not installed correctly. If you want to use memory_efficient_att                                                                                ention to accelerate training use the following command to install Xformers
pip install xformers.
2023-12-04 11:32:49,214 - INFO - Converting the current model to sym_int4 format......
2023-12-04 11:32:55,035 - WARNING - Xformers is not installed correctly. If you want to use memory_efficient_att                                                                                ention to accelerate training use the following command to install Xformers
pip install xformers.
>> loading of model costs 57.58747953200003s
<class 'transformers_modules.Baichuan2-7B-Chat.modeling_baichuan.BaichuanForCausalLM'>
input length is:  torch.Size([1, 32])
model generate cost: 5.4736868800000025
actual_out_len 32
model.first_cost, model.rest_cost_mean 4.821632456999964 0.019813965774192772
input length is:  torch.Size([1, 32])
model generate cost: 0.6568430059999741
actual_out_len 32
model.first_cost, model.rest_cost_mean 0.11245485400002053 0.017537034548390985
input length is:  torch.Size([1, 32])
model generate cost: 0.6595462580000344
actual_out_len 32
model.first_cost, model.rest_cost_mean 0.11286746700000094 0.017612448419358332
input length is:  torch.Size([1, 32])
model generate cost: 0.6585891749999746
actual_out_len 32
model.first_cost, model.rest_cost_mean 0.11245271799998591 0.01759617270967569
qiuxin2012 commented 10 months ago

I can't reproduced this error, too. It runs successfully on my test machine.

kevin-t-tang commented 10 months ago

Upgrade to bigdl-llm 2.5.0b20231205

//Case 1 model.half.to_xpu() model.first_cost, model.rest_cost_mean 0.7022517440000229 0.02676972233333193 input length is: torch.Size([1, 2048]) model generate cost: 0.8681921279999187 actual_out_len 7

out length seems too small, please help check.