Open mgoin opened 3 months ago
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
Your current environment
The output of
python collect_env.py
:🐛 Describe the bug
It seems the front-end server can easily get overloaded when there are many pending requests (>1000 seems to roughly be the threshold).
Individual requests over the threshold being failing with:
This quickly fills the server's output as it throw an exception for each request.
Steps to replicate
Server command:
Benchmark command (needs more than 1000 pending prompts to trigger):
NOTE: The backend engine seems to continue running fine, it is just new requests throw exceptions in the front-end
Running with
--disable-frontend-multiprocessing
or downgrading tov0.5.3
will resolve the issue.