Closed andreibondarev closed 6 days ago
Shouldn't the cache layer be as simple as hashing the inputs to the api calls and caching the results that way? GPTCache looks like it's solving for a semantic cache which wouldn't be generally useful for all use cases (i.e. text extraction workflows that might be very semantically similar across queries).
It looks like GPTCache has that option, but it would be a much simpler thing to implement only exact hits first.
@weilandia Yes, I think that would be a good order to build this out in: 1) exact inputs (hashed) as the key and then 2) "fuzzier" cache based on semantic similarity.
Closing this for now as it seems that the LLM providers themselves will be supporting this out of the box.
We need a mechanism to cache identical and similar requests to the LLM. The Python de-facto solution is https://github.com/zilliztech/GPTCache.