Closed Cyberes closed 1 year ago
Let's leave this issue up as a thread so it can be referenced as the project works towards concurrency.
sqs solves this problem in cloud - but locally - there's this / uses same format. https://github.com/softwaremill/elasticmq
This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.
hi,guys. how to make it. my command line is "python server.py --model chinese-alpaca-2-7b-hf --listen --chat --load-in-8bit --multi-user --api --api-blocking-port 8080 --threads-batch 2000 --threads 1000".it doesn't work. multi-user concurrent request is block.
Hey lets give this issue a bump, this can really bring oobabooga to the next level. Very important use case.
Description
Process multiple API requests at once. Currently, requests are blocked and processed one at a time. Being able to batch requests would turn this from a fun toy into a powerful LLM backend.
Additional Context
There's already been a lot of discussion on this topic. https://github.com/oobabooga/text-generation-webui/issues/2568 https://github.com/oobabooga/text-generation-webui/issues/2475 https://github.com/oobabooga/text-generation-webui/pull/3048 https://github.com/oobabooga/text-generation-webui/pull/3048