zilliztech / GPTCache

Semantic cache for LLMs. Fully integrated with LangChain and llama_index.
https://gptcache.readthedocs.io
MIT License
7.21k stars 502 forks source link

[Enhancement]: Async function calls to improve the concurrency #415

Open vinvcn opened 1 year ago

vinvcn commented 1 year ago

What would you like to be added?

Hi, I did a brief study on the code base we have here. Seems the most part that involves external I/O is not async supported?

Given that the CPython implementation enforces the GIL, the main thread that runs the script will be blocked by these I/O and drastically reduces the concurrency. To make it worse for the users of this library, any calls to retrieve cached results will have to be blocked here, no matter how the user has optimized their code to use async calls.

Why is this needed?

The I/O constitute the major part of this library, and python only has one thread running. It is important to use async IMHO.

Anything else?

Refer to using async redis client as an example:

https://redis.com/blog/async-await-programming-basics-python-examples/

SimFG commented 1 year ago

In the entire cache operation, it is not the db access that takes the most time, but the operation of embedding and other models. But for asynchrony, this is indeed an optimization point, and we are currently planning

mingqxu7 commented 1 year ago

I would also be very keen on using the async version. Currently my chatbot uses the chain.arun() calls, it'll be great that LangChainLLMs from the gptcache.adapter.langchain_models support acall(). Is there an ETA for this feature?