kahing / catfs

Cache AnyThing filesystem written in Rust
Apache License 2.0
842 stars 54 forks source link

Does not follow stated behavior in README: "Entire file is cached if it's open for read" #67

Open hayk-skydio opened 2 years ago

hayk-skydio commented 2 years ago

I am using goofys + catfs and observing that if I read the first N bytes of a file, it seems to cache only those bytes and not the whole file as stated in the README.

Mount command:

goofys --region us-west-2 --file-mode 0440 -o allow_other --cache /tmp/test_dir_cache s3_bucket_name:test_dir /tmp/test_dir_mount

Example of reading first N bytes (same behavior happens when reading from a python script:

head --bytes 100 /tmp/test_dir_mount/large_file.mp4

The size of the large file here is 158M:

ls -halp /tmp/test_dir_mount/large_file.mp4

However the size of the cached file after the read is 128K:

ls -halp /tmp/test_dir_cache/large_file.mp4

I have tested this with multiple files and reads from 10 bytes up to 100M, resulting in the same behavior of only caching the read part. If instead the last N bytes are read, the entire file is cached.

This seems to contradict the README, which states Entire file is cached if it's open for read, even if nothing is actually read..

My desired behavior would be to have a flag that toggles these two behaviors, and appropriate documentation. As it currently stands, one of our use cases is failing because it depended on caching the entire file on touching it.

Finally, please comment if this should be opened as a goofys issue instead.

System: