zilliztech / GPTCache

Semantic cache for LLMs. Fully integrated with LangChain and llama_index.
https://gptcache.readthedocs.io
MIT License
7.06k stars 500 forks source link

[Enhancement]: Remove vector data when the corresponding cache data expires in cache storage #632

Open Laarryliu opened 1 month ago

Laarryliu commented 1 month ago

What would you like to be added?

In adapter.py line 108, when trying to search for data in cache store according to vector data:

for search_data in search_data_list:
            cache_data = time_cal(
                chat_cache.data_manager.get_scalar_data,
                func_name="get_data",
                report_func=chat_cache.report.data,
            )(
                search_data,
                extra_param=context.get("get_scalar_data", None),
                session=session,
            )
            if cache_data is None:
                continue

here, if cache_data is Nonemeans no data in cache store could be found with data from vector store. Is it better to remove the corresponding data from vector store to avoid hit it again?

Why is this needed?

Since the data in cache store is no longer available (maybe due to a cache ttl strategy, etc.), removing vector data is necessary so that the expired cache data will no longer be accessed.

Anything else?

Please correct me if I missed some details

SimFG commented 1 month ago

I don't recommend cleaning data here, because when data becomes invalid using the evict strategy, all data will be deleted, including scalars and vectors. detail: https://github.com/zilliztech/GPTCache/blob/main/gptcache/manager/eviction_manager.py

Laarryliu commented 1 month ago

I don't recommend cleaning data here, because when data becomes invalid using the evict strategy, all data will be deleted, including scalars and vectors. detail: https://github.com/zilliztech/GPTCache/blob/main/gptcache/manager/eviction_manager.py

Thank you for the suggestion. After using the Redis evict strategy, I have another question: seems like there is a "deleted" field in eviction base, which eviction_manager.soft_evict() method will mark as -1, then eviction_manager.delete() method will permanently remove soft-evicted data from both scalar&vector store. However, looks like only Memory evict storage will try to do the soft_evict operation. When Redis evict strategy is used, is there a way to mark deleted? Seems that using TTL in Redis results in permanent deletion and the corresponding vector data will never be cleaned up.

SimFG commented 1 month ago

@Laarryliu For the relationship between eviction manager and evict base, you can refer to this https://github.com/zilliztech/GPTCache/blob/main/gptcache/manager/data_manager.py#L236. In fact, we have abstracted the evict base here, and the eviction manager deletion does not actually care about the strategy. image But I looked at this part and found that there is indeed a bug. Here, the on_evict internal attribute of the passed evict base should be assigned a value. Thanks a lot!

If you can, please help me create the pr when you have free time.

Laarryliu commented 1 month ago

@Laarryliu For the relationship between eviction manager and evict base, you can refer to this https://github.com/zilliztech/GPTCache/blob/main/gptcache/manager/data_manager.py#L236. In fact, we have abstracted the evict base here, and the eviction manager deletion does not actually care about the strategy. image But I looked at this part and found that there is indeed a bug. Here, the on_evict internal attribute of the passed evict base should be assigned a value. Thanks a lot!

If you can, please help me create the pr when you have free time.

Do you mean that if I init a SSDataManager with a RedisCacheEviction(or other implementations), an on_evict attribute should be init with _clear() method? Also, in MemoryCacheEviction implementation, on_evict function is used as a callback function when a key in memory cache is evicted. So I think it's also necessary to somehow trigger the on_evict function when a key in RedisCacheEviction evicted? Now I can't figure out how on_evict is triggered in RedisCacheEviction.

SimFG commented 1 month ago

yes