mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation
https://llm.mlc.ai/
Apache License 2.0
19.02k stars 1.56k forks source link

[Bug] some Qwen-72B based llamafied models compatibility test #1893

Open alphaarea opened 7 months ago

alphaarea commented 7 months ago

🐛 Bug

I‘m interesting to use mlc-llm try new models on OpenLLM Leaderboard. Since Qwen does not yet support multi GPU inference, I tried multiple Qwen-72b based llamafied models. I hope this helps mlc-llm develop.

Models Date Architecture Template Convert Config Compile Inference Speed(decode) Note
Weyaxi/Qwen-72B-Llama 20240302 llama gpt2 TRUE TRUE TRUE FALSE 17.237 bad output
CausalLM/72B-preview-llamafied-qwen-llamafy lagency llama gpt2 TRUE TRUE TRUE FALSE ~16 bad output
moreh/MoMo-72B-LoRA-V1.4 20240302 llama gpt2 TRUE TRUE TRUE FALSE 16.744 bad output
abacusai/Smaug-72B-v0.1 20240302 llama gpt2 TRUE TRUE TRUE FALSE 16.818 bad output

Note: Weyaxi/Qwen-72B-Llama and CausalLM/72B-preview-llamafied-qwen-llamafy need fix tokenizer files manually follow this method

These models are without fatal errors at all stages of the process, but output confused text when inference.

All models have the Unused extern parameters warning during the model convert. Unused extern parameters warning like this:
[2024-03-05 13:58:11] WARNING utils.py:25: Unused extern parameters: model.layers.0.self_attn.k_proj.bias, model.layers.0.self_attn.o_proj.bias, model.layers.0.self_attn.q_proj.bias, model.layers.0.self_attn.v_proj.bias...

Please download the attached file for full log information

20240302 test log.zip

To Reproduce

MODEL_PATH='/home/alphaarea/models/Smaug-72B-v0.1'
MLC_QUANT='q4f16_1'
MLC_DEV='cuda'
MODEL_ARCH='llama'
MODEL_TEMP='gpt2'
MODEL_NAME=${MODEL_PATH##*/}
MODEL_OUTPUT=$MODEL_PATH'-'$MLC_QUANT
MODEL_LIB=$MODEL_NAME'-'$MLC_QUANT'-'$MLC_DEV'.so'

mlc_chat convert_weight --quantization $MLC_QUANT --model-type $MODEL_ARCH --device $MLC_DEV --output $MODEL_OUTPUT $MODEL_PATH
mlc_chat gen_config --quantization $MLC_QUANT --model-type $MODEL_ARCH --conv-template $MODEL_TEMP --context-window-size 2048 --tensor-parallel-shards 4 --max-batch-size 1 --output $MODEL_OUTPUT $MODEL_PATH
mlc_chat compile --device $MLC_DEV --opt 'O0' --output $MODEL_OUTPUT/$MODEL_LIB $MODEL_OUTPUT/mlc-chat-config.json
mlc_chat bench --generate-length 512 --model-lib-path $MODEL_OUTPUT/$MODEL_LIB $MODEL_OUTPUT

Expected behavior

Should output normal text

Environment

Additional context

confused output example

and there me definition is the as a system of a sign The the  that that and in the   since can a little give any or this the of2/ a is the., even the the, a an is argument of of a of a definition the35uringb to9 it of Definition&# of a remarkable about of definition the as3 and the on  of a definition of of  to all the. through this and4 and the of what the what the of a definition of only, and of. of only of real for the the about of week of to her, a a definition the , the ofia the.. and. of not need,a all sources9 of definition the- of a part between a noun to force who. and it her be his as this. and out a bit, a bit the the?
 a lot., one the he the of thought of a non of a time to what a Who of a puzzle of a lot , a year of the  the. the of why of  It’s a lot of the. What of of the of a bit of a month of the of her of a bit of a process of a bit of all not of ACH), and the2 a world of the of A. A. A. of A = A x of A > A9 A A
 A = A7 A6 A2 A,A A9 A - A, A > of A I A,
 A = A.
 A - A, A - A,
A = A2 A x - A, A - A,
A = A A - A - A, A,
A = A - A - A - A, A,
and of these A A = A - A - A, A,
A = A - A - A - A - A,
A = A - A - A A - A, A,
A = A - of a the of A - A9 of A = A - A of A - of a - of A = A - of A - of a lot of A = A A = A of a diagram of A - of a = A A A of A A of a set of A A A A = A of a set of A A =5 A1 of A A of a it,
 = a the of a A999 of A9 of a a9 of a A a of a of a = Aint,
 = A9 of a = ,
999
DiegoCao commented 7 months ago

Thanks, working on reproduce that.