sahib / rmlint

Extremely fast tool to remove duplicates and other lint from your filesystem
http://rmlint.rtfd.org
GNU General Public License v3.0
1.91k stars 132 forks source link

Nice to have feature: skip sparse files (for deduplication) #567

Open fedy-cz opened 2 years ago

fedy-cz commented 2 years ago

I'm missing one (in my mind) pretty useful feature (option): To skip over all the sparse files during the search for duplicates.

Use case: In many cases, sparse files get pre-allocated, and only once they are completely written (downloaded, generated, ...) they are immutable and it's safe to deduplicate them. While they are still incomplete (sparse) they can't be safely considered the same file (even if at bit level they currently are). Reflinks are safe, but not supported by many filesystems. Skipping sparse files and using hardlinks/symlinks should be a relatively safe workaround.