sahib / rmlint

Extremely fast tool to remove duplicates and other lint from your filesystem
http://rmlint.rtfd.org
GNU General Public License v3.0
1.86k stars 128 forks source link

Preprocessing seems stuck - debugging tips? #561

Closed james-cook closed 2 years ago

james-cook commented 2 years ago

In my current run preprocessng seems to get stuck.

The HDD is working away. No spin down. But the numbers in the preprocessing line do not change, even after many hours. This happens with master and with develop + some patches.

The original command: rmlint --progress --xattr --keep-all-tagged --must-match-tagged '/srv/dev-disk-by-label-OMV2/shd2/from.ext.hdd.M-family.TOSH_SOURCE-HDD_ORPHANS/from.ext.hdd.F' // '/srv/dev-disk-by-label-OMV2/shd2/from.ext.hdd.F'

The first dir is "new" and contains no hardlinks of any kind (internal to the directory or outside the directory). I more or less assume that all files in dir1 are inside dir2 - I am using rmlint to check this for sure. The second dir is the product of many rmlint -c sh:hardlink runs and contains many repeated inodes.

Preprocess output gets "stuck" here:

?¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦?          Traversing (2434375 usable files / 10873 + 5196 ignored files / folders)
... reduces files to 1614949 / found 47955 other lint)

Thinking there might be a very large file on which rmlint is taking so much time I limited the max file size to 2GB with -s -2GBin the command line. This doesn't change much, rmlint is stuck preprocessing at:

?¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦?          Traversing (2434336 usable files / 10873 + 5196 ignored files / folders)
... reduces files to 1614915 / found 47955 other lint)

I am now running the same job overnight with -vvv instead of --progress to see what emerges.

Any tips on how to find if and why preprocessing may be stuck in a loop e.g. Debug line in preprocess.c?

cebtenzzre commented 2 years ago

A backtrace would be helpful to see what step it's actually on. Otherwise I'd just be making guesses.

james-cook commented 2 years ago

I'm very happy to report that there is NO problem with rmlint!

The run just took 14 hours which was longer than I was expecting. By using -vvv I could be sure that something was happening. I took a copy of rmlint.sh to check for loops at some point. The resulting rmlint.sh is ca. 650MB and rmlint.json is 1.3GB.

The run completed:

DEBUG: Freeing device 2128 (pointer 0x456ec0)
Waiting for progress counters to catch up...Done
DEBUG: Remaining 0 bytes in 0 files
DEBUG: Dupe search finished at time 51109.383
 Dragan Aleksic _ Matthes & Seitz Berlin-00-Döbler,Katharina.mp3'

==> Note: Please use the saved script below for removal, not the above output.
==> In total 2434399 files, whereof 1157270 are duplicates in 454040 groups.
==> This equals 383.98 GB of duplicates which could be removed.
==> 278702 other suspicious item(s) found, which may vary in size.
==> Scanning took in total 14h 11m 49.387s.

Sorry about the false alarm