markfasheh / duperemove

Tools for deduping file systems
GNU General Public License v2.0
795 stars 80 forks source link

[question] using file size for preliminary filtering #179

Closed MayeulC closed 6 years ago

MayeulC commented 7 years ago

Hi,

I have read the documentation a bit, and was wondering if we couldn't use the file size to generate a list of duplicates candidates. This could give a nice speedup, as I do not see any situation where files with different sizes could have the same checksum (except for collisions, of course). File size can be considered as a hash, after all, and a precalculated one, with that.

Unless this is something duperemove is already doing, of course (hence the "question" tag in the issue title). What is your stance on this?

nefelim4ag commented 7 years ago

@MayeulC, for fdupes mode it's true and it's already done by fdupes For general dedup mode, it's not true, because duperemove will be deduplicated blocks of data not a files

MayeulC commented 7 years ago

That makes sense, thank you for your answer. I think that issue can be closed, unless there is more to say on that topic, but I will let you decide.