Concurrent timeout - Githubissues

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Apache License 2.0

30.04k stars 4.54k forks source link

Your current environment

Start Service Command： python -m vllm.entrypoints.openai.api_server --model /data/Qwen1.5-1.8B-Chat-GPTQ-Int4 --served-model-name Qwen1.5-1.8B-Chat-GPTQ-Int4 --quantization gptq --dtype float16 --gpu-memory-utilization 0.2 --tensor-parallel-size 1 --trust-remote-code --max-model-len 4096 --served-model-name qwen1.5-1.8b

🐛 Describe the bug

When testing with 50 concurrent requests, there are 4 failures with a prompt of connection timeout. How should this issue be resolved?

vllm-project / vllm

Concurrent timeout #6009

Your current environment

🐛 Describe the bug