[Bug] some Qwen-72B based llamafied models compatibility test

🐛 Bug

I‘m interesting to use mlc-llm try new models on OpenLLM Leaderboard. Since Qwen does not yet support multi GPU inference, I tried multiple Qwen-72b based llamafied models. I hope this helps mlc-llm develop.

Models	Date	Architecture	Template	Convert	Config	Compile	Inference	Speed(decode)	Note
Weyaxi/Qwen-72B-Llama	20240302	llama	gpt2	TRUE	TRUE	TRUE	FALSE	17.237	bad output
CausalLM/72B-preview-llamafied-qwen-llamafy	lagency	llama	gpt2	TRUE	TRUE	TRUE	FALSE	~16	bad output
moreh/MoMo-72B-LoRA-V1.4	20240302	llama	gpt2	TRUE	TRUE	TRUE	FALSE	16.744	bad output
abacusai/Smaug-72B-v0.1	20240302	llama	gpt2	TRUE	TRUE	TRUE	FALSE	16.818	bad output

Note: Weyaxi/Qwen-72B-Llama and CausalLM/72B-preview-llamafied-qwen-llamafy need fix tokenizer files manually follow this method

These models are without fatal errors at all stages of the process, but output confused text when inference.

All models have the Unused extern parameters warning during the model convert. Unused extern parameters warning like this:
[2024-03-05 13:58:11] WARNING utils.py:25: Unused extern parameters: model.layers.0.self_attn.k_proj.bias, model.layers.0.self_attn.o_proj.bias, model.layers.0.self_attn.q_proj.bias, model.layers.0.self_attn.v_proj.bias...

Please download the attached file for full log information

20240302 test log.zip

To Reproduce

MODEL_PATH='/home/alphaarea/models/Smaug-72B-v0.1'
MLC_QUANT='q4f16_1'
MLC_DEV='cuda'
MODEL_ARCH='llama'
MODEL_TEMP='gpt2'
MODEL_NAME=${MODEL_PATH##*/}
MODEL_OUTPUT=$MODEL_PATH'-'$MLC_QUANT
MODEL_LIB=$MODEL_NAME'-'$MLC_QUANT'-'$MLC_DEV'.so'

mlc_chat convert_weight --quantization $MLC_QUANT --model-type $MODEL_ARCH --device $MLC_DEV --output $MODEL_OUTPUT $MODEL_PATH
mlc_chat gen_config --quantization $MLC_QUANT --model-type $MODEL_ARCH --conv-template $MODEL_TEMP --context-window-size 2048 --tensor-parallel-shards 4 --max-batch-size 1 --output $MODEL_OUTPUT $MODEL_PATH
mlc_chat compile --device $MLC_DEV --opt 'O0' --output $MODEL_OUTPUT/$MODEL_LIB $MODEL_OUTPUT/mlc-chat-config.json
mlc_chat bench --generate-length 512 --model-lib-path $MODEL_OUTPUT/$MODEL_LIB $MODEL_OUTPUT

Expected behavior

Should output normal text

Environment

Platform: CUDA
Operating system: Ubuntu 22.04.3 LTS (5.15.0-91-generic)
Device: Tesla P100 x4
How you installed MLC-LLM: python3 -m pip install --pre -U -f https://mlc.ai/wheels mlc-chat-nightly-cu121 mlc-ai-nightly-cu121
How you installed TVM-Unity: python3 -m pip install --pre -U -f https://mlc.ai/wheels mlc-chat-nightly-cu121 mlc-ai-nightly-cu121
Python version: 3.11
GPU driver version: 545.23.08
CUDA/cuDNN version: 12.1
TVM Unity Hash Tag: see attach file
Any other relevant information: on

Additional context

confused output example

and there me definition is the as a system of a sign The the  that that and in the   since can a little give any or this the of2/ a is the., even the the, a an is argument of of a of a definition the35uringb to9 it of Definition&# of a remarkable about of definition the as3 and the on  of a definition of of  to all the. through this and4 and the of what the what the of a definition of only, and of. of only of real for the the about of week of to her, a a definition the , the ofia the.. and. of not need,a all sources9 of definition the- of a part between a noun to force who. and it her be his as this. and out a bit, a bit the the?
 a lot., one the he the of thought of a non of a time to what a Who of a puzzle of a lot , a year of the  the. the of why of  It’s a lot of the. What of of the of a bit of a month of the of her of a bit of a process of a bit of all not of ACH), and the2 a world of the of A. A. A. of A = A x of A > A9 A A
 A = A7 A6 A2 A,A A9 A - A, A > of A I A,
 A = A.
 A - A, A - A,
A = A2 A x - A, A - A,
A = A A - A - A, A,
A = A - A - A - A, A,
and of these A A = A - A - A, A,
A = A - A - A - A - A,
A = A - A - A A - A, A,
A = A - of a the of A - A9 of A = A - A of A - of a - of A = A - of A - of a lot of A = A A = A of a diagram of A - of a = A A A of A A of a set of A A A A = A of a set of A A =5 A1 of A A of a it,
 = a the of a A999 of A9 of a a9 of a A a of a of a = Aint,
 = A9 of a = ,
999

mlc-ai / mlc-llm