zilliztech / GPTCache

Semantic cache for LLMs. Fully integrated with LangChain and llama_index.
https://gptcache.readthedocs.io
MIT License
7.06k stars 500 forks source link

some errors when i use milvus with GPTCache (cache : max_size) #633

Open KimMinSang96 opened 1 month ago

KimMinSang96 commented 1 month ago

I want to use gptcache with milvus. I have created the following code by referring to the example:

`data_manager = get_data_manager(CacheBase("sqlite"), 
                            VectorBase("milvus", 
                            dimension=onnx.dimension, 
                            index_params=MILVUS_INDEX_PARAMS,
                            search_params = SEARCH_PARAM[ann_type],
                            local_mode=False), 
                            max_size=100, 
                            clean_size=10)

cache.init( embedding_func=onnx.to_embeddings, data_manager=data_manager, similarity_evaluation=SearchDistanceEvaluation(), config=Config(similarity_threshold=0.9, auto_flush=5 ) ) This code is supposed to evict 10 out of 100 items when the number of queries exceeds 100, with a cache size of 100. However, I'm not sure if this is working correctly in milvus. For debugging, I added the following code to milvus.py:

def search(self, data: np.ndarray, top_k: int = -1):
    if top_k == -1:
        top_k = self.top_k
    search_result = self.col.search(
        data=data.reshape(1, -1).tolist(),
        anns_field="embedding",
        param=self.search_params,
        limit=top_k,
    )
    print(f"self.col.num_entities : {self.col.num_entities}")
    return list(zip(search_result[0].distances, search_result[0].ids))`

I added num_entities to check the size, and I found that it manages more than the cache size. Can you tell me why this is happening? Also, is there a way to make it work according to the cache_size?" Is there anything specific you'd like me to explain or help with regarding this translated content?

SimFG commented 1 month ago

This is mainly because the num entity of milvus only obtains an approximate number of data rows, which is not accurate. If you need the exact number of rows, you need to call the query interface. You can refer to the milvus document: https://milvus.io/docs/get-and-scalar-query.md#Use-Advanced-Operators

res = client.query(
collection_name="collectin_name",
output_fields=["count(*)"]
)

For eviction, perhaps some explanations of another issue can deepen the understanding of this aspect