Open RustyNova016 opened 7 months ago
Update:
DiskCacheBuilder::set_sync_to_disk_on_cache_change
doesn't change anything.
Running this test multiple times seems to reproduce the problem of accumulating files in the blobs folder of the cache:
#[io_cached(
disk = true,
time = 1000000,
map_error = r##"|e| TestError::DiskError(format!("{:?}", e))"##,
key = "u32",
convert = r"{ n }",
)]
async fn cached_disk(n: u32) -> Result<Vec<u32>, TestError> {
Ok((0..n).collect::<Vec<_>>())
}
#[tokio::test]
async fn test_cached_disk() {
let n = 1_000_000;
let expected = Ok((0..n).collect::<Vec<_>>());
assert_eq!(cached_disk(n).await, expected);
assert_eq!(cached_disk(n).await, expected);
}
Unfortunately, this issue makes DiskCache
useless.
Thanks for pointing this out. I'm able to reproduce also. It looks like this is due to either a bug or the intended operation of the underlying sled::Db
. Also looks like development has slowed down on that project. It would probably be best to change the backing store to something more reliable, like sqlite
There is a potential ugly workaround until the storage backend is swapped. DiskCacheBuilder
lets you set a specific disk directory set_disk_directory
where each function's file cache will be stored. It looks like sled
is continuously creating a new copy of the blob data with the name being an incremental integer. You could scan this directory on startup, for each sub-dir (the name of a function in all caps + _v1
) scan the blobs/
dir, sort the files and delete all but the latest file (either keep the largest integer name or check the file metadata for last created)
Thanks for pointing this out. I'm able to reproduce also. It looks like this is due to either a bug or the intended operation of the underlying
sled::Db
. Also looks like development has slowed down on that project. It would probably be best to change the backing store to something more reliable, like sqlite
Thanks for prompt reply.
It seems to me that sled::Db
is working fine in a sense that it does speed up the function look-up operations, yet it duplicates "blobs" as you mentioned. Regarding sqlite - it wouldn't be my first choice due to features overkill for this purpose, and it isn't a cargo-friendly dependency.
There is a potential ugly workaround until the storage backend is swapped. DiskCacheBuilder lets you set a specific disk directory set_disk_directory where each function's file cache will be stored. It looks like sled is continuously creating a new copy of the blob data with the name being an incremental integer. You could scan this directory on startup, for each sub-dir (the name of a function in all caps + _v1) scan the blobs/ dir, sort the files and delete all but the latest file (either keep the largest integer name or check the file metadata for last created)
I thought about something similar myself, but it is really an ugly workaround :-) I'll try.
It would probably be best to change the backing store to something more reliable, like sqlite
RocksDB may be a better choice.
Personally I have switched to sled DB at home and did a combination of Cacache for the storage and serde_rmp to convert to binary data
Not sure if it's the best way tho. But once this issue is sorted I'm definitely switching back
I am using a DiskCache with an inner vector, making files in the blob folder of the cache. However, those blob files aren't removed/overwritten when I'm overwriting a cache entry. This leads to inflated storage usage that I rather not have since those structs are heavy.
Here's a example program to test the bug: