sahib / rmlint

Extremely fast tool to remove duplicates and other lint from your filesystem
http://rmlint.rtfd.org
GNU General Public License v3.0
1.87k stars 130 forks source link

json has no hashsum? #462

Closed ghost closed 3 years ago

ghost commented 3 years ago

I am trying to deduplicate remote disks, in a way that is suggested by issues https://github.com/sahib/rmlint/issues/329 and https://github.com/sahib/rmlint/issues/199

I'm running time rmlint -g -c json:unique -mkr // /home/. When I am browsing the resulting json, I see no field for a hash sum. How would rmlint on a different machine, using --replay, find duplicates?

SeeSpotRun commented 3 years ago

Yes, in the interest of execution time we don't complete checksums on files which diverge in the first few kb or so. Could add an option -c json:hash_uniques as a work-around if you still need this.

SeeSpotRun commented 3 years ago

Ok @lockywolf , two new options have now been added and merged into the develop branch https://github.com/sahib/rmlint/tree/develop. With --hash-uniques, all found files get hashed. With --hash-unmatched, only size-twins get hashed. This is more efficient for dupe-finding, because if you only have one file that is 4,635,235,654 bytes long then it can't have any duplicates.

Also with either of these options specified, you no longer need -c json:unique

ghost commented 3 years ago

That's going to help, thank you!

Shall the issue be closed?

SeeSpotRun commented 3 years ago

Resolved by #479.