sahib / rmlint

Extremely fast tool to remove duplicates and other lint from your filesystem
http://rmlint.rtfd.org
GNU General Public License v3.0
1.86k stars 128 forks source link

Fix --hash-unmatched behavior #615

Open intelfx opened 1 year ago

intelfx commented 1 year ago

The problem with --hash-unmatched is twofold.

First, the criterion in rm_shred_group_update_status() that is responsible for hashing partially-hashed groups to the end is broken. It can never be fulfilled because group->head_files is always empty at this point, and even if it was not, head->digest is always NULL because that's what rm_shred_group_push_file() does.

The only reason why --hash-unmatched even works is because the --merge-directories condition above is broken as well and it always fires for all single-file groups. (Consequently, --hash-unmatched without --merge-directories is broken in develop.) However, the same bug also means that all unique files will always get hashed to the end if --merge-directories is set, thus making --hash-unmatched behave like --hash-uniques.

This function only works as designed if neither --hash-unmatched nor --hash-uniques is set, because in this case all single-file groups get filtered early by the rm_shred_group_qualifies() check.

Fixes #614.