Description:
The LocalFileStore class can be used to create an on-disk CacheBackedEmbeddings cache. The number of files in these embeddings caches can grow to be quite large over time (hundreds of thousands) as embeddings are computed for new versions of content, but the embeddings for old/deprecated content are not removed.
A least-recently-used (LRU) cache policy could be applied to the LocalFileStore directory to delete cache entries that have not been referenced for some time:
# delete files that have not been accessed in the last 90 days
find embeddings_cache_dir/ -atime 90 -print0 | xargs -0 rm
However, most filesystems in enterprise environments disable access time modification on read to improve performance. As a result, the access times of these cache entry files are not updated when their values are read.
To resolve this, this pull request updates the LocalFileStore constructor to offer an update_atime parameter that causes access times to be updated when a cache entry is read.
Description: The
LocalFileStore
class can be used to create an on-diskCacheBackedEmbeddings
cache. The number of files in these embeddings caches can grow to be quite large over time (hundreds of thousands) as embeddings are computed for new versions of content, but the embeddings for old/deprecated content are not removed.A least-recently-used (LRU) cache policy could be applied to the
LocalFileStore
directory to delete cache entries that have not been referenced for some time:However, most filesystems in enterprise environments disable access time modification on read to improve performance. As a result, the access times of these cache entry files are not updated when their values are read.
To resolve this, this pull request updates the
LocalFileStore
constructor to offer anupdate_atime
parameter that causes access times to be updated when a cache entry is read.For example,
The default is
False
, which retains the original behavior.Testing: I updated the LocalFileStore unit tests to test the access time update.