zilliztech / GPTCache

Semantic cache for LLMs. Fully integrated with LangChain and llama_index.
https://gptcache.readthedocs.io
MIT License
7.15k stars 503 forks source link

[Enhancement]: concurrency option for gptcache server #523

Open a9raag opened 1 year ago

a9raag commented 1 year ago

What would you like to be added?

Issue: No web concurrency

GPTCache uses uvicorn as its ASGI. However, in the server implementation, there isn't an option to pass the number of workers that enables concurrency. Even though it is possible to set the WEB_CONCURRECNY env variable, it might break server.py implementation since the cache object which is initialized through YAML configuration, won't be shared across forked processes.

Possible enhancement:

Creating a Singleton instance that initializes Cache object once per forked process, which then can be accessed during API calls such as get, put, etc.

Why is this needed?

To improve throughput and QPS for GPTCahe server implementation

Anything else?

No response