sahib / rmlint

Extremely fast tool to remove duplicates and other lint from your filesystem
http://rmlint.rtfd.org
GNU General Public License v3.0
1.87k stars 130 forks source link

Wrong number of duplicated displayed. #461

Open ghost opened 3 years ago

ghost commented 3 years ago

I am running rmlint like this: /root/rmlint-2.10.1/rmlint --limit-mem 2500M -mkgr -T dd /mnt/SEAGATE/ // /mnt/WD/

The output is the following:

# /root/rmlint-2.10.1/rmlint --limit-mem 2500M -mkgr -T dd /mnt/SEAGATE/ // /mnt/WD/
���������������������������������������                             Traversing (1196900 usable files / 0 + 0 ignored files / folders)
���������������������������������������                                  Preprocessing (reduces files to 792057 / found 0 other lint)
���������������������������������������                   Matching (88875 dupes of 49077 originals; 0 B to scan in 0 files, ETA: 14s)
���������������������������������������                                                  Merging files into directories (stand by...)

==> In total 1196900 files, whereof 18446744073709524917 are duplicates in 6530 groups.
==> This equals 291.66 GB of duplicates which could be removed.
==> Scanning took in total  2h  1m 31.661s.

Wrote a json file to: /root/2020-12-18_sorting_hdd-running-rmlint-seagate-vs-wd/rmlint.json
Wrote a sh file to: /root/2020-12-18_sorting_hdd-running-rmlint-seagate-vs-wd/rmlint.sh

This is unlikely to be true, since 18 446 744 073 709 524 917 is an enormous number.

I am rmlinting two HDDs, of which one has ~1.5Tb of files, and the other one has ~400Gb.

SeeSpotRun commented 3 years ago

Sorry for delayed response.

I'm guessing the duplicate dirs (-T dd) file-counting logic is failing somewhere. The value 18446744073709524917 is actually -26698 expressed as an unsigned 64-bit integer.

If you run with -T df do you get more sensible results?

cebtenzzre commented 1 year ago

At first glance this looks like the issue fixed by https://github.com/cebtenzzre/rmlint/commit/3c969ee62bae4a6973548cd428e23f484696317c edit: probably not.

ghost commented 1 year ago

Ran the test on a few dirs, and haven't found the wrap around case. So I can't close the issue because I can't test it any more, but whoever is happy with the @Cebtenzzre 's solution may close.

ghost commented 1 year ago

The issue is still there.

==> Note: Please use the saved script below for removal, not the above output.
==> In total 57 files, whereof 18446744073709551615 are duplicates in 0 groups.
==> This equals 125.16 MB of duplicates which could be removed.
==> Scanning took in total 1.779s.

Those directories are small, and only differ by a single file, called Markor_2021-11-27T15-46-51 (2).jpg

Note the space in the name.