pkolaczk / fclones

Efficient Duplicate File Finder
MIT License
1.94k stars 75 forks source link

Option to skip full checking (maybe extended checksums) #201

Closed johnpyp closed 1 year ago

johnpyp commented 1 year ago

I'm de-duplicating thousands of large files (3-15GB each), and assuming the size, first and last checksum match, there's a very high probability that the file is the same.

It only takes ~1min to get past first/last checksum, but would take hours to get through the scan.

Could fclones provide an option to stop there and finish? And/or, could there be a "random sample" approach taken where files of matching size deterministically hash say another 5 ranges of their contents to increase confidence, but without nearing the demand of reading the entire file?