zilliztech / GPTCache

Semantic cache for LLMs. Fully integrated with LangChain and llama_index.
https://gptcache.readthedocs.io
MIT License
7.2k stars 502 forks source link

[Feature] Cache consistency check for Chroma & Milvus #448

Closed wybryan closed 1 year ago

wybryan commented 1 year ago

What is it: For GPTCache to work correctly, the cache store and vector store must be in sync with each other, i.e., any discrepency between the two will cause erroneous cache hit and consequently return incorrect response back to user. There exists many reasons to cause both store out of sync with each other unintentionally in live production environment, therefore the consistency between cache store and vector store should be explictly checked instead of taking for granted, so that we add fault tolerance into the system by design.

This feature address this by adding cache self-check into SearchDistanceEvaluation Class. In the event of inconsistency is found, not only preventing wrong response sent to user, cache can also self-heal by replacing problematic entries such that the cache will achieve in-sync gradually over time.

How it works: we check embeddings from vector store and embeddings from cache store by same ID, they should be identical if both stores are in sync, otherwise out-of-sync is detected. When out-of-sync is detected, we manipute distance returned by vector search to infinity to prevent cache hit. At same time, we replace the out-of-sync entry in vector store by ID and embeddings from the cache store, so next timer around, we don't have this out-of-sync entry any more.

How to use it: unit test file is provided to showcase GPTCache bebaviour with & without this feature. This feature is turned off by default so it is compatible with existing codebase. For start, this feature has been implmented for Chroma & Milvus vector store respectively.

wybryan commented 1 year ago

/assign @cxie

xiaofan-luan commented 1 year ago

why not just simply delete the vector in case it can be searched again?

wybryan commented 1 year ago

why not just simply delete the vector in case it can be searched again?

I presume you meant how to deal with the cache out-of-sync after it is detected, in other words, the self-heal part.

Deleting the vector is not ideal in my view because the vector store and cache store is still out-of-sync after detection, i.e., ID column in vector store is out of sync with ID column in vector store. In addition, depending on the underlying vector store database, we have better option, for example, in Chroma, we have native update API to do this; in Milvus, it is done by 2 ops: deleting the trouble entry first then insert with the same ID.

The distance manipulation is required to prevent wrong answer sent back to user regardless of the delete ops is used or not.

To sum up, because we have better ops to implement this and the goal is to ensure stores are in-sync, therefore I think delete is not the best way to do it after the detection.

wybryan commented 1 year ago

I hope above explains your question in terms of degisn rationale. In terms of the code, it refers to update_embeddings() method in vector store.

In code, the following is implemented:

  1. get_embeddings() is used for consistency detection
  2. update_embeddings() is used to consistency correction (I call it self-heal).

the entire logic is put together at cache_health_check() inside SimilarityEvaluation.evaluate().

Please have a look and let me know your thoughts.

SimFG commented 1 year ago

ok, i will review the pr

SimFG commented 1 year ago

I have read the entire PR. Actually, I am curious to know if you have experienced this issue or if you believe there may be an underlying risk.

wybryan commented 1 year ago

I have read the entire PR. Actually, I am curious to know if you have experienced this issue or if you believe there may be an underlying risk.

both actually.

I had problem when deploying on cache and vector store both on remote host. And the TTL for each store was set differently then the incorrect response rate rises with no apparent reason until I found this. The problem of such is steath in nature, there is no error or crush, but the result is completely wrong because of the sutle difference in configuration. Strictly speaking, it wasn't GPTCache causing out-of-sync happened, but I would argue that the cache has all the information at hand to verify the response was incorrect before sending back to user.

It is also a big underlying risk from system design point of view, because the cache relies on multiple storage, and these storage can be either local or remote, so consistency should be taken into the design. Many things can go wrong in real-world deployment, TTL, clock sync, network issue and whatnot, so in my view sending incorrect response to user because of out-of-sync is orders of magnitude worse than cache miss. that's why this is implemented to ensure it is not happening sliently.

SimFG commented 1 year ago

Thank you very much for your detailed explanation. If this is the case, would it be better to put this verification logic before the similarity evaluation? Then whether this verification is enabled or not is controlled by cache config

image

wybryan commented 1 year ago

Thank you very much for your detailed explanation. If this is the case, would it be better to put this verification logic before the similarity evaluation? Then whether this verification is enabled or not is controlled by cache config

image

you are right, this is better way to implement this. Let me make chage to do this way as you suggested.

sre-ci-robot commented 1 year ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: wybryan To complete the pull request process, please ask for approval from simfg after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/zilliztech/GPTCache/blob/dev/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
wybryan commented 1 year ago

Hi @SimFG, I have refactored the code as what you suggested, cheers.

SimFG commented 1 year ago

@wybryan Thank you, and please resolve the code conflict

wybryan commented 1 year ago

@wybryan Thank you, and please resolve the code conflict

DONE

wybryan commented 1 year ago

Hi @SimFG, the code has passed the linting & unit test.

There seems to be a Milvus server starting issue which one test started Milvus server and done with it, any other tests try to do the same thing will fail. This is out of the scope of this PR, please have a look at the unit test log.

wybryan commented 1 year ago

Hi @SimFG, the code has passed the linting & unit test.

There seems to be a Milvus server starting issue which one test started Milvus server and done with it, any other tests try to do the same thing will fail. This is out of the scope of this PR, please have a look at the unit test log.

never mind, my last attempts fixed it. Normal tests are all passed now.

codecov[bot] commented 1 year ago

Codecov Report

Merging #448 (5b2cb4a) into dev (e43160f) will decrease coverage by 0.05%. The diff coverage is 77.50%.

Additional details and impacted files [![Impacted file tree graph](https://app.codecov.io/gh/zilliztech/GPTCache/pull/448/graphs/tree.svg?width=650&height=150&src=pr&token=E30WxqBeJJ&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=zilliztech)](https://app.codecov.io/gh/zilliztech/GPTCache/pull/448?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=zilliztech) ```diff @@ Coverage Diff @@ ## dev #448 +/- ## ========================================== - Coverage 93.97% 93.92% -0.05% ========================================== Files 90 90 Lines 3335 3374 +39 ========================================== + Hits 3134 3169 +35 - Misses 201 205 +4 ``` | [Impacted Files](https://app.codecov.io/gh/zilliztech/GPTCache/pull/448?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=zilliztech) | Coverage Δ | | |---|---|---| | [gptcache/similarity\_evaluation/distance.py](https://app.codecov.io/gh/zilliztech/GPTCache/pull/448?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=zilliztech#diff-Z3B0Y2FjaGUvc2ltaWxhcml0eV9ldmFsdWF0aW9uL2Rpc3RhbmNlLnB5) | `100.00% <ø> (ø)` | | | [gptcache/manager/vector\_data/chroma.py](https://app.codecov.io/gh/zilliztech/GPTCache/pull/448?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=zilliztech#diff-Z3B0Y2FjaGUvbWFuYWdlci92ZWN0b3JfZGF0YS9jaHJvbWEucHk=) | `75.00% <33.33%> (-12.10%)` | :arrow_down: | | [gptcache/manager/vector\_data/base.py](https://app.codecov.io/gh/zilliztech/GPTCache/pull/448?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=zilliztech#diff-Z3B0Y2FjaGUvbWFuYWdlci92ZWN0b3JfZGF0YS9iYXNlLnB5) | `72.41% <60.00%> (-3.59%)` | :arrow_down: | | [gptcache/manager/vector\_data/milvus.py](https://app.codecov.io/gh/zilliztech/GPTCache/pull/448?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=zilliztech#diff-Z3B0Y2FjaGUvbWFuYWdlci92ZWN0b3JfZGF0YS9taWx2dXMucHk=) | `87.50% <90.00%> (+0.32%)` | :arrow_up: | | [gptcache/adapter/adapter.py](https://app.codecov.io/gh/zilliztech/GPTCache/pull/448?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=zilliztech#diff-Z3B0Y2FjaGUvYWRhcHRlci9hZGFwdGVyLnB5) | `87.38% <100.00%> (+1.52%)` | :arrow_up: | ... and [1 file with indirect coverage changes](https://app.codecov.io/gh/zilliztech/GPTCache/pull/448/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=zilliztech)
Laarryliu commented 3 months ago

@SimFG Excuse me,do you have any plan on releasing this PR ? Now we have encountered similar problems.

SimFG commented 3 months ago

@Laarryliu The latest release should have included this change. You can try the latest version.