Closed wybryan closed 1 year ago
/assign @cxie
why not just simply delete the vector in case it can be searched again?
why not just simply delete the vector in case it can be searched again?
I presume you meant how to deal with the cache out-of-sync after it is detected, in other words, the self-heal part.
Deleting the vector is not ideal in my view because the vector store and cache store is still out-of-sync after detection, i.e., ID column in vector store is out of sync with ID column in vector store. In addition, depending on the underlying vector store database, we have better option, for example, in Chroma, we have native update API to do this; in Milvus, it is done by 2 ops: deleting the trouble entry first then insert with the same ID.
The distance manipulation is required to prevent wrong answer sent back to user regardless of the delete ops is used or not.
To sum up, because we have better ops to implement this and the goal is to ensure stores are in-sync, therefore I think delete is not the best way to do it after the detection.
I hope above explains your question in terms of degisn rationale. In terms of the code, it refers to update_embeddings() method in vector store.
In code, the following is implemented:
the entire logic is put together at cache_health_check() inside SimilarityEvaluation.evaluate().
Please have a look and let me know your thoughts.
ok, i will review the pr
I have read the entire PR. Actually, I am curious to know if you have experienced this issue or if you believe there may be an underlying risk.
I have read the entire PR. Actually, I am curious to know if you have experienced this issue or if you believe there may be an underlying risk.
both actually.
I had problem when deploying on cache and vector store both on remote host. And the TTL for each store was set differently then the incorrect response rate rises with no apparent reason until I found this. The problem of such is steath in nature, there is no error or crush, but the result is completely wrong because of the sutle difference in configuration. Strictly speaking, it wasn't GPTCache causing out-of-sync happened, but I would argue that the cache has all the information at hand to verify the response was incorrect before sending back to user.
It is also a big underlying risk from system design point of view, because the cache relies on multiple storage, and these storage can be either local or remote, so consistency should be taken into the design. Many things can go wrong in real-world deployment, TTL, clock sync, network issue and whatnot, so in my view sending incorrect response to user because of out-of-sync is orders of magnitude worse than cache miss. that's why this is implemented to ensure it is not happening sliently.
Thank you very much for your detailed explanation. If this is the case, would it be better to put this verification logic before the similarity evaluation? Then whether this verification is enabled or not is controlled by cache config
Thank you very much for your detailed explanation. If this is the case, would it be better to put this verification logic before the similarity evaluation? Then whether this verification is enabled or not is controlled by cache config
you are right, this is better way to implement this. Let me make chage to do this way as you suggested.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: wybryan To complete the pull request process, please ask for approval from simfg after the PR has been reviewed.
The full list of commands accepted by this bot can be found here.
Hi @SimFG, I have refactored the code as what you suggested, cheers.
@wybryan Thank you, and please resolve the code conflict
@wybryan Thank you, and please resolve the code conflict
DONE
Hi @SimFG, the code has passed the linting & unit test.
There seems to be a Milvus server starting issue which one test started Milvus server and done with it, any other tests try to do the same thing will fail. This is out of the scope of this PR, please have a look at the unit test log.
Hi @SimFG, the code has passed the linting & unit test.
There seems to be a Milvus server starting issue which one test started Milvus server and done with it, any other tests try to do the same thing will fail. This is out of the scope of this PR, please have a look at the unit test log.
never mind, my last attempts fixed it. Normal tests are all passed now.
Merging #448 (5b2cb4a) into dev (e43160f) will decrease coverage by
0.05%
. The diff coverage is77.50%
.
@SimFG Excuse me,do you have any plan on releasing this PR ? Now we have encountered similar problems.
@Laarryliu The latest release should have included this change. You can try the latest version.
What is it: For GPTCache to work correctly, the cache store and vector store must be in sync with each other, i.e., any discrepency between the two will cause erroneous cache hit and consequently return incorrect response back to user. There exists many reasons to cause both store out of sync with each other unintentionally in live production environment, therefore the consistency between cache store and vector store should be explictly checked instead of taking for granted, so that we add fault tolerance into the system by design.
This feature address this by adding cache self-check into SearchDistanceEvaluation Class. In the event of inconsistency is found, not only preventing wrong response sent to user, cache can also self-heal by replacing problematic entries such that the cache will achieve in-sync gradually over time.
How it works: we check embeddings from vector store and embeddings from cache store by same ID, they should be identical if both stores are in sync, otherwise out-of-sync is detected. When out-of-sync is detected, we manipute distance returned by vector search to infinity to prevent cache hit. At same time, we replace the out-of-sync entry in vector store by ID and embeddings from the cache store, so next timer around, we don't have this out-of-sync entry any more.
How to use it: unit test file is provided to showcase GPTCache bebaviour with & without this feature. This feature is turned off by default so it is compatible with existing codebase. For start, this feature has been implmented for Chroma & Milvus vector store respectively.