Open DreamGenX opened 6 months ago
I encountered the same problem. model: qwen-72b-chat-int4 vllm: 0.3.1
I solved it. 'Cause I passed in an empy prompt by mistake.
I can confirm this issue exists even when input is not zero.
following is my payload
@WoosukKwon
curl -X 'POST' \
'https://xxxxxxxxxxxxx.net/v1/completions' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "test/7b",
"prompt": "abc",
"max_tokens": 16,
"temperature": 1,
"top_p": 0.36,
"stream": false,
"top_k": 20,
"ignore_eos": false,
"use_beam_search": false,
"stop_token_ids": [
0
],
"skip_special_tokens": true,
"spaces_between_special_tokens": true,
"repetition_penalty": 1,
"min_p": 0,
"include_stop_str_in_output": false,
"length_penalty": 1
}'
+1
Somehow
max_prompt_len
may be 0 in this code: https://github.com/vllm-project/vllm/blob/264017a2bf030f060ebad91eb9be9b4e0033edb9/vllm/worker/model_runner.py#L232