Evaluation on if MiniCPM-2B-sft-bf16 need model based optimization on ipex-llm

intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.

Apache License 2.0

6.55k stars 1.25k forks source link

Below are the benchmark results on both THUDM/chatglm3-6b and openbmb/MiniCPM-2B-sft-bf16, from which we can see that chatglm3-6b has better throughput than miniCPM-2b. Considering MiniCPM-2b is a 2B model while chatglm3-6b is a 6B model, I'm not sure if the results are considered normal or further optimization should be done on miniCPM-2b.

Platform: Core ultra 7 165H, 32GB*2=64GB DDR5 5600 MT/s, ubuntu 22.04, Ipex-llm 2.1.0b20240526

Model	Data type	Batch	Input	Output	First token latency （ms/token）	Rest token latency（ms/token）
THUDM/chatglm3-6b	INT4-SYM	1	32	32	468.38	46.98
	INT4-SYM	1	1024	128	3997.21	48.87
	INT4-SYM	1	1024	1024	3987.13	49.31
	INT4-SYM	1	2048	1024	8607.79	50.25
openbmb/MiniCPM-2B-sft-bf16	INT4-SYM	1	32	32	258.62	44.72
	INT4-SYM	1	1024	128	2602	65.81
	INT4-SYM	1	1024	1024	2720.03	80.4
	INT4-SYM	1	2048	1024	6910.26	112.05

Platform: Core ultra 7 165H, 32GB*2=64GB DDR5 5600 MT/s, ubuntu 22.04, Ipex-llm 2.1.0b20240606 Compared to previous version, the performance improvement is obvious.

model | 1st token avg latency (ms) | 2+ avg latency (ms/token) | input/output tokens | batch_size | low_bit -- | -- | -- | -- | -- | -- openbmb/MiniCPM-2B-sft-bf16 | 246.71 | 24.11 | 32-32 | 1 | sym_int4 openbmb/MiniCPM-2B-sft-bf16 | 2493.34 | 27.85 | 1024-128 | 1 | sym_int4 openbmb/MiniCPM-2B-sft-bf16 | 2626.22 | 29.96 | 1024-1024 | 1 | sym_int4 openbmb/MiniCPM-2B-sft-bf16 | 6618.25 | 34.09 | 2048-1024 | 1 | sym_int4

intel-analytics / ipex-llm

Evaluation on if MiniCPM-2B-sft-bf16 need model based optimization on ipex-llm #11163