[Bug] Models generate whitespace-only output when temperature is in range [1e-4, 1e-5], regardless of model type

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Apache License 2.0

28.77k stars 4.26k forks source link

Setting the temperature in a particular range causes vllm to generate whitespace-only outputs. Values above/below this range work correctly. I have seen this with facebook/opt-125m, fine-tuned mistral-7B models, codellama-13B, and several other models. It seems like this is an issue with vllm rather than the particular model:

To reproduce: python -m vllm.entrypoints.openai.api_server --model facebook/opt-125m

Send request:

curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "facebook/opt-125m",
"prompt": "San Francisco is a",
"max_tokens": 7,
"temperature": <temperature>
}'

With temperature: 1e-3: Generates " great place to live. I" 1e-4: Generates "\~~\~~\~~\~~\~~\~~\~~" 1e-5: Generates "\~~\~~\~~\~~\~~\~~\~~" 1e-6: Generates " great place to live. I"~~~~~~~~~~~~~~~~~~~~~~~~~~~~

vllm-project / vllm

[Bug] Models generate whitespace-only output when temperature is in range [1e-4, 1e-5], regardless of model type #3063