I‘m interesting to use mlc-llm try new models on OpenLLM Leaderboard. Since Qwen does not yet support multi GPU inference, I tried multiple Qwen-72b based llamafied models. I hope this helps mlc-llm develop.
Models
Date
Architecture
Template
Convert
Config
Compile
Inference
Speed(decode)
Note
Weyaxi/Qwen-72B-Llama
20240302
llama
gpt2
TRUE
TRUE
TRUE
FALSE
17.237
bad output
CausalLM/72B-preview-llamafied-qwen-llamafy
lagency
llama
gpt2
TRUE
TRUE
TRUE
FALSE
~16
bad output
moreh/MoMo-72B-LoRA-V1.4
20240302
llama
gpt2
TRUE
TRUE
TRUE
FALSE
16.744
bad output
abacusai/Smaug-72B-v0.1
20240302
llama
gpt2
TRUE
TRUE
TRUE
FALSE
16.818
bad output
Note: Weyaxi/Qwen-72B-Llama and CausalLM/72B-preview-llamafied-qwen-llamafy need fix tokenizer files manually follow this method
These models are without fatal errors at all stages of the process, but output confused text when inference.
All models have the Unused extern parameters warning during the model convert. Unused extern parameters warning like this: [2024-03-05 13:58:11] WARNING utils.py:25: Unused extern parameters: model.layers.0.self_attn.k_proj.bias, model.layers.0.self_attn.o_proj.bias, model.layers.0.self_attn.q_proj.bias, model.layers.0.self_attn.v_proj.bias...
Please download the attached file for full log information
How you installed MLC-LLM: python3 -m pip install --pre -U -f https://mlc.ai/wheels mlc-chat-nightly-cu121 mlc-ai-nightly-cu121
How you installed TVM-Unity: python3 -m pip install --pre -U -f https://mlc.ai/wheels mlc-chat-nightly-cu121 mlc-ai-nightly-cu121
Python version: 3.11
GPU driver version: 545.23.08
CUDA/cuDNN version: 12.1
TVM Unity Hash Tag: see attach file
Any other relevant information: on
Additional context
confused output example
and there me definition is the as a system of a sign The the that that and in the since can a little give any or this the of2/ a is the., even the the, a an is argument of of a of a definition the35uringb to9 it of Definition&# of a remarkable about of definition the as3 and the on of a definition of of to all the. through this and4 and the of what the what the of a definition of only, and of. of only of real for the the about of week of to her, a a definition the , the ofia the.. and. of not need,a all sources9 of definition the- of a part between a noun to force who. and it her be his as this. and out a bit, a bit the the?
a lot., one the he the of thought of a non of a time to what a Who of a puzzle of a lot , a year of the the. the of why of It’s a lot of the. What of of the of a bit of a month of the of her of a bit of a process of a bit of all not of ACH), and the2 a world of the of A. A. A. of A = A x of A > A9 A A
A = A7 A6 A2 A,A A9 A - A, A > of A I A,
A = A.
A - A, A - A,
A = A2 A x - A, A - A,
A = A A - A - A, A,
A = A - A - A - A, A,
and of these A A = A - A - A, A,
A = A - A - A - A - A,
A = A - A - A A - A, A,
A = A - of a the of A - A9 of A = A - A of A - of a - of A = A - of A - of a lot of A = A A = A of a diagram of A - of a = A A A of A A of a set of A A A A = A of a set of A A =5 A1 of A A of a it,
= a the of a A999 of A9 of a a9 of a A a of a of a = Aint,
= A9 of a = ,
999
🐛 Bug
I‘m interesting to use mlc-llm try new models on OpenLLM Leaderboard. Since Qwen does not yet support multi GPU inference, I tried multiple Qwen-72b based llamafied models. I hope this helps mlc-llm develop.
Note:
Weyaxi/Qwen-72B-Llama
andCausalLM/72B-preview-llamafied-qwen-llamafy
need fix tokenizer files manually follow this methodThese models are without fatal errors at all stages of the process, but output confused text when inference.
All models have the
Unused extern parameters
warning during the model convert. Unused extern parameters warning like this:[2024-03-05 13:58:11] WARNING utils.py:25: Unused extern parameters: model.layers.0.self_attn.k_proj.bias, model.layers.0.self_attn.o_proj.bias, model.layers.0.self_attn.q_proj.bias, model.layers.0.self_attn.v_proj.bias...
Please download the attached file for full log information
20240302 test log.zip
To Reproduce
Expected behavior
Should output normal text
Environment
python3 -m pip install --pre -U -f https://mlc.ai/wheels mlc-chat-nightly-cu121 mlc-ai-nightly-cu121
python3 -m pip install --pre -U -f https://mlc.ai/wheels mlc-chat-nightly-cu121 mlc-ai-nightly-cu121
Additional context
confused output example