Open luhairong11 opened 4 months ago
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
Your current environment
Start Service Command: python -m vllm.entrypoints.openai.api_server --model /data/Qwen1.5-1.8B-Chat-GPTQ-Int4 --served-model-name Qwen1.5-1.8B-Chat-GPTQ-Int4 --quantization gptq --dtype float16 --gpu-memory-utilization 0.2 --tensor-parallel-size 1 --trust-remote-code --max-model-len 4096 --served-model-name qwen1.5-1.8b
🐛 Describe the bug
When testing with 50 concurrent requests, there are 4 failures with a prompt of connection timeout. How should this issue be resolved?