Is your feature request related to a problem? Please describe.
We are creating multiple GPT Assistants to support our customer service operations. Currently, there is only support for caching results via ChatCompletion. We would like to see a GPT Assistant implementation of GPTCache.
GPT Assistants will mainly answer multiple queries that are very common like
Greeting a user/customer
Generic FAQs - questions that have been answered previously (stored in cache)
Out-of-Scope Answers that are stored in the cache ( using RAG, we limit the answers to the ones stored in the cache)
Over the course of time, we build a decent cache storage of all answers for user queries. These answers can be pulled from the cache directly instead of requesting Open AI every time. This would save us a lot of tokens and latency time.
Describe the solution you'd like.
I would like to see a solution where we have
GPT Assistant adapter class in Open AI adapter
Functionality to select assistant via assistant_id
Functionality to create and update cache for a specific assistant
Is your feature request related to a problem? Please describe.
We are creating multiple GPT Assistants to support our customer service operations. Currently, there is only support for caching results via ChatCompletion. We would like to see a GPT Assistant implementation of GPTCache.
GPT Assistants will mainly answer multiple queries that are very common like
Over the course of time, we build a decent cache storage of all answers for user queries. These answers can be pulled from the cache directly instead of requesting Open AI every time. This would save us a lot of tokens and latency time.
Describe the solution you'd like.
I would like to see a solution where we have
Describe an alternate solution.
No response
Anything else? (Additional Context)
No response