Open intelfx opened 1 year ago
I can make --hash-unmatched
do what it says on the tin with this code, but it feels hacky:
I wonder if there is something else subtly wrong in the code.
It appears that when --hash-unmatched
is used in an unmodified rmlint, this condition is responsible for hashing all the single-file groups:
Could someone please explain what exactly is being done here, what's the idea behind this special case?
Disregard the comment above (the suggested fix is wrong), see proper analysis in the linked PR.
rmlint version
v2.10.1-281-g58d29ec1
gui/setup.py
to fix #608dataset
I have a 30-something TB dataset, that consists of ~20 TB uniques and ~11 TB size-twins:
actual behavior
Basic rmlint invocation without
--hash-unmatched
(ignore--without-fiemap
, it's just there to speed up preprocessing, progress-bars were also trimmed):Control rmlint invocation with
--hash-uniques
:Now,
--hash-unmatched
:expected behavior
Isn't
--hash-unmatched
supposed to only scan size twins (i. e. 12 TB at most)?