oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
40.4k stars 5.3k forks source link

Concurrent/Simultaneous API Requests Thread #3767

Closed Cyberes closed 1 year ago

Cyberes commented 1 year ago

Description

Process multiple API requests at once. Currently, requests are blocked and processed one at a time. Being able to batch requests would turn this from a fun toy into a powerful LLM backend.

Additional Context

There's already been a lot of discussion on this topic. https://github.com/oobabooga/text-generation-webui/issues/2568 https://github.com/oobabooga/text-generation-webui/issues/2475 https://github.com/oobabooga/text-generation-webui/pull/3048 https://github.com/oobabooga/text-generation-webui/pull/3048

Cyberes commented 1 year ago

Let's leave this issue up as a thread so it can be referenced as the project works towards concurrency.

johndpope commented 1 year ago

sqs solves this problem in cloud - but locally - there's this / uses same format. https://github.com/softwaremill/elasticmq

github-actions[bot] commented 1 year ago

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

qianxifu commented 1 year ago

hi,guys. how to make it. my command line is "python server.py --model chinese-alpaca-2-7b-hf --listen --chat --load-in-8bit --multi-user --api --api-blocking-port 8080 --threads-batch 2000 --threads 1000".it doesn't work. multi-user concurrent request is block.

mercuryyy commented 9 months ago

Hey lets give this issue a bump, this can really bring oobabooga to the next level. Very important use case.