Closed zhyncs closed 3 months ago
I wouldn't call this a bug because unlike other inference backends that use the huggingface model id also as the default model name for the API server, the model name for lmdeploy
API server has to be from one of the names under lmdeploy list
, and afaik this cannot be user defined either when launching the server.
lmdeploy list
The older chat template name like "internlm2-7b", "qwen-7b" and so on are deprecated and will be removed in the future. The supported chat template names are:
baichuan2
chatglm
codellama
dbrx
deepseek
deepseek-coder
deepseek-vl
falcon
gemma
internlm
internlm2
llama
llama2
mistral
mixtral
puyu
qwen
solar
ultracm
ultralm
vicuna
wizardlm
yi
yi-vl
The benchmark script already provides the flexibility to allow users to specify --model
for only two purposes:
model
in the payload when calling the server via OpenAI API.--tokenizer
.I can make a PR to make this clearer if that helps.
@ywang96 In order to make this benchmark run as expected, perhaps we can add a parameter similar to model_name
for the lmdeploy scenario. Do you have any suggestions?
@ywang96 In order to make this benchmark run as expected, perhaps we can add a parameter similar to
model_name
for the lmdeploy scenario. Do you have any suggestions? Without changing anything or just adding some user instructions, the issue cannot be solved.
Wouldn't this work for lmdeploy
? (Modified based on your original command in the issue)
python3 benchmarks/benchmark_serving.py \
--backend lmdeploy \
--model llama2 \
--tokenizer /workdir/Llama-2-13b-chat-hf \
--dataset-name sharegpt \
--dataset-path /workdir/ShareGPT_V3_unfiltered_cleaned_split.json \
--request-rate 128 \
--num-prompts 1000 \
--port 23333
Perhaps I can make the intention of --model
clearer in its argument help message.
make sense
I can make a PR to make this clearer if that helps.
It's ok.
Your current environment
🐛 Describe the bug
Hi @ywang96 Currently there is a small issue in benchmarks/backend_request_func when benchmark LMDeploy with Llama-2-13b-chat-hf.
I need to change request_func_input.model to
llama2
https://github.com/vllm-project/vllm/blob/f3d0bf7589d6e63a691dcbb9d1db538c184fde29/benchmarks/backend_request_func.py#L222After manual modification, testing can be conducted, and the correct result is:
Otherwise, the test result is incorrect because the model name was not correctly matched.