langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
84.83k stars 13.11k forks source link

langchain: enhance `LocalFileStore` to offer `update_atime` parameter that updates access times on read #20951

Closed chrispy-snps closed 1 week ago

chrispy-snps commented 2 weeks ago

Description: The LocalFileStore class can be used to create an on-disk CacheBackedEmbeddings cache. The number of files in these embeddings caches can grow to be quite large over time (hundreds of thousands) as embeddings are computed for new versions of content, but the embeddings for old/deprecated content are not removed.

A least-recently-used (LRU) cache policy could be applied to the LocalFileStore directory to delete cache entries that have not been referenced for some time:

# delete files that have not been accessed in the last 90 days
find embeddings_cache_dir/ -atime 90 -print0 | xargs -0 rm

However, most filesystems in enterprise environments disable access time modification on read to improve performance. As a result, the access times of these cache entry files are not updated when their values are read.

To resolve this, this pull request updates the LocalFileStore constructor to offer an update_atime parameter that causes access times to be updated when a cache entry is read.

For example,

file_store = LocalFileStore(temp_dir, update_atime=True)

The default is False, which retains the original behavior.

Testing: I updated the LocalFileStore unit tests to test the access time update.

vercel[bot] commented 2 weeks ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment | Name | Status | Preview | Comments | Updated (UTC) | | :--- | :----- | :------ | :------- | :------ | | **langchain** | ⬜️ Ignored ([Inspect](https://vercel.com/langchain/langchain/4uDBeiE9yZ8kaRpBfzfZ3AdRGEXA)) | [Visit Preview](https://langchain-git-fork-chrispy-snps-chrispy-add-lo-dff8ed-langchain.vercel.app) | | Apr 26, 2024 9:40pm |
eyurtsev commented 1 week ago

Thank you @chrispy-snps