juicedata / juicefs

JuiceFS is a distributed POSIX file system built on top of Redis and S3.
https://juicefs.com
Apache License 2.0
10.06k stars 888 forks source link

Dedup prefetch requests at the entrance to avoid massive read amplification #4947

Open polyrabbit opened 1 week ago

polyrabbit commented 1 week ago

We observed a large read amplification when user is doing an incremental but non-continuous sequential read on large amount of files. After labeling object GET requests with purpose, we found prefetching is the main cause of the read amplification.

It turns out the deduplication in prefetcher::do method does not work as expected especially when there is only one prefetcher (the default behavior), which caused duplicated prefetching. Instead, it's better to do deduplication before inserting into the pending queue.

image