[Enhancement]: concurrency option for gptcache server

What would you like to be added?

Issue: No web concurrency

GPTCache uses uvicorn as its ASGI. However, in the server implementation, there isn't an option to pass the number of workers that enables concurrency. Even though it is possible to set the WEB_CONCURRECNY env variable, it might break server.py implementation since the cache object which is initialized through YAML configuration, won't be shared across forked processes.

Possible enhancement:

Creating a Singleton instance that initializes Cache object once per forked process, which then can be accessed during API calls such as get, put, etc.

Why is this needed?

To improve throughput and QPS for GPTCahe server implementation

Anything else?

No response

zilliztech / GPTCache