vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
28.77k stars 4.26k forks source link

[Bug] Models generate whitespace-only output when temperature is in range [1e-4, 1e-5], regardless of model type #3063

Closed saumya-saran closed 1 month ago

saumya-saran commented 7 months ago

Setting the temperature in a particular range causes vllm to generate whitespace-only outputs. Values above/below this range work correctly. I have seen this with facebook/opt-125m, fine-tuned mistral-7B models, codellama-13B, and several other models. It seems like this is an issue with vllm rather than the particular model:

To reproduce: python -m vllm.entrypoints.openai.api_server --model facebook/opt-125m

Send request:

curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "facebook/opt-125m",
"prompt": "San Francisco is a",
"max_tokens": 7,
"temperature": <temperature>
}'

With temperature: 1e-3: Generates " great place to live. I" 1e-4: Generates "\\\\\\\" 1e-5: Generates "\\\\\\\" 1e-6: Generates " great place to live. I"