zilliztech / GPTCache

Semantic cache for LLMs. Fully integrated with LangChain and llama_index.
https://gptcache.readthedocs.io
MIT License
6.96k stars 490 forks source link

[Enhancement]: Option to set context in request in GPTCache Server #534

Open l0calh0st8080 opened 10 months ago

l0calh0st8080 commented 10 months ago

What would you like to be added?

I am using GPTCache Server and use /put and /get primarily . In my use case, there are multiple user utilizing this server. I want to add context to every request, it could be anything like id or request_id so that put and get adds or looks up according to that. example: /put body might look like this:

{
    "prompt": "hello",
    "answer" : "Hi there!",
    "id": "abc123"
}

below will return the answer because it is cached with same id /get

{
    "prompt": "hi",
    "id": "abc123"
}

below will not return any answer even if it was cached the id is different /get

{
    "prompt": "hi",
    "id": "xyz567"
}

Why is this needed?

My application uses GPTCache server as it is, and it is multitenant. I can have multiple user/organisation/project and they don't want to share cache between them

Anything else?

No response

SimFG commented 10 months ago

good ideas!

michael19960921 commented 10 months ago

Agree, because I have also encountered this issue, and now the same content cannot be separated from multiple sessions before

But before making any changes, it can be distinguished as follows

Each time a cache is added, an identification ID can be added in front of the content, and this identification ID can also be concatenated during queries For example, when adding: {ID} Hello, when querying: {ID} Hello

l0calh0st8080 commented 10 months ago

Agree, because I have also encountered this issue, and now the same content cannot be separated from multiple sessions before

But before making any changes, it can be distinguished as follows

Each time a cache is added, an identification ID can be added in front of the content, and this identification ID can also be concatenated during queries For example, when adding: {ID} Hello, when querying: {ID} Hello

I have tried this before. I started caching prompt and response like, {user_id} {prompt}, same I tried querying. It has too many false positives. example: prompt: "132 Hello", "133 Hello" matched with same response.

I think since it has vector based(semantic matching), it cannot do strict matching, which results in false matches. I could be wrong though.

l0calh0st8080 commented 10 months ago

we are maintaining our fork and have added multi-tenancy there: https://github.com/NumexaHQ/GPTCache/pull/1/commits/41aae693ff6534523f3db4e423ccda5bf72efc12