Open momomobinx opened 1 month ago
We have a tracking issue (#5901) for this. Please provide more details there so we can better troubleshoot the underlying cause.
We have a tracking issue (#5901) for this. Please provide more details there so we can better troubleshoot the underlying cause.
same question here https://github.com/vllm-project/vllm/issues/6363
Faced same issue today. Was running a script calling API concurrently and server generated same error after about 15-20k requests
Faced with the same problem, the lowest number of requests was 4000, and the highest number was 45,000. Two 3080 graphics cards were used to run the glm4-int8 model.
Your current environment
🐛 Describe the bug
32 concurrent massive inferences