sahib / rmlint

Extremely fast tool to remove duplicates and other lint from your filesystem
http://rmlint.rtfd.org
GNU General Public License v3.0
1.86k stars 128 forks source link

rmlint 2.10.1 crashes during preprocessing #475

Closed intelfx closed 1 year ago

intelfx commented 3 years ago

OS / Environment

Arch Linux x86_64, Linux 5.11.2

rmlint is being run on a btrfs filesystem.

Version

# rmlint --version                                                                                                                                                                                       
version 2.10.1 compiled: Nov 13 2020 at [03:46:10] "Ludicrous Lemur" (rev unknown)
compiled with: +mounts +nonstripped +fiemap +sha512 +bigfiles +intl +replay +xattr +btrfs-support

rmlint was written by Christopher <sahib> Pahl and Daniel <SeeSpotRun> Thomas.
The code at https://github.com/sahib/rmlint is licensed under the terms of the GPLv3.

Problem

(ASCII-art progressbars trimmed)

$ sudo systemd-run -d --pty -pMemoryHigh=10G -pIOSchedulingClass=idle rmlint -T df -Dj -c sh:handler=reflink --progress --hidden --xattr --with-fiemap /mnt/data
Running as unit: run-u53.service
Press ^] three times within 1s to disconnect TTY.
Traversing (1657297 usable files / 0 + 0 ignored files / folders)
Preprocessing (reduces files to 1652314 / found 0 other lint)
ERROR:lib/shredder.c:752:rm_shred_group_free: assertion failed: (self->num_pending == 0)
Bail out! ERROR:lib/shredder.c:752:rm_shred_group_free: assertion failed: (self->num_pending == 0)

This reproduces less than 100% time, but I just had this two times in a row (and directly before that, rmlint hung on writing out result files).

Any ideas? Should I recompile with debug info and collect the backtrace somehow?

SeeSpotRun commented 3 years ago

Should I recompile with debug info...

Unlikely to help since the error message is from an internal assert and indicates a fault in the preprocessing logic.

Any ideas?

Is the filesystem being written to during rmlint scan?

Are you compiling off the master branch or develop? If the former then please compile develop branch and re-test.

SeeSpotRun commented 3 years ago

Nothing further heard; please re-open if further info available

intelfx commented 3 years ago

So, I wasn't able to reproduce this for a long time somewhy but it did happen again today.

Is the filesystem being written to during rmlint scan?

Yes, but not the portion that I was rmlint-ing today.

Are you compiling off the master branch or develop? If the former then please compile develop branch and re-test.

I was running rmlint 2.10.1 from my distribution's packages. I built current develop and the bug was not there. I did a reverse bisect between v2.10.1 and develop:

$ git bisect log
git bisect start '--term-old=buggy' '--term-new=fixed'
# buggy: [2a4443d1b8129736adee5edc354e80e5f12be598] bump version to 2.10.1 and adjust CHANGELOG
git bisect buggy 2a4443d1b8129736adee5edc354e80e5f12be598
# fixed: [bdb591f4bc124fad6da1035ffb2b513826e9d64f] Merge pull request #512 from SeeSpotRun/blake3_local
git bisect fixed bdb591f4bc124fad6da1035ffb2b513826e9d64f
# fixed: [71f1dafde42dffff8cf91ab6620a1231ac94a544] travis: migrate to travis-ci.com
git bisect fixed 71f1dafde42dffff8cf91ab6620a1231ac94a544
# fixed: [1c8179eaf16a923c3ec9076649ed7b1e40da0d4e] replay: make --unmatched-basenames work
git bisect fixed 1c8179eaf16a923c3ec9076649ed7b1e40da0d4e
# buggy: [e6898533949bd750cbb978c4d793ffa19912befe] Make json-glib a hard dependency
git bisect buggy e6898533949bd750cbb978c4d793ffa19912befe
# fixed: [b9b3b5a27ffd2ff2b4546133b97262b9d962ca70] utilities: de-inline some IO-related syscalls (see https://www.tutorialspoint.com/when-to-use-inline-function-and-when-not-to-use-it-in-c-cplusplus)
git bisect fixed b9b3b5a27ffd2ff2b4546133b97262b9d962ca70
# fixed: [345fa37ecc7b62528a6f3c2cd2df5b3d4d9b039d] Merge branch 'glib-json' into hash_uniques
git bisect fixed 345fa37ecc7b62528a6f3c2cd2df5b3d4d9b039d
# buggy: [ec2f3cc47075cfee014bb8b46cc698006f77de9b] json: fix memory leaks
git bisect buggy ec2f3cc47075cfee014bb8b46cc698006f77de9b
# fixed: [1e58df6d211f06812bbe36862b1cdc9512f346ed] cmdline: add --hash-unmatched option (more efficient than --hash-uniques)
git bisect fixed 1e58df6d211f06812bbe36862b1cdc9512f346ed
# buggy: [dcdab3b8c25bc7b9bf67c3472c0afa1940c5d4ec] cmdline: deprecate --write-unfinished
git bisect buggy dcdab3b8c25bc7b9bf67c3472c0afa1940c5d4ec
# first fixed commit: [1e58df6d211f06812bbe36862b1cdc9512f346ed] cmdline: add --hash-unmatched option (more efficient than --hash-uniques)

The exact rmlint command line I used for bisecting was rmlint -T df -Dj -vv --hidden --xattr --with-fiemap /path/to/files.

I have no idea what did it fix or how, but you might want to do a point release or something.

SeeSpotRun commented 3 years ago

Thanks, looks like some faulty logic in https://github.com/sahib/rmlint/commit/1e58df6d211f06812bbe36862b1cdc9512f346ed#diff-3f33733d2d2fd1104183b65c147d55eac073ed3526f24e71d84a5b15314f59eeL846-L852 ..which has to do with pre-matching duplicates based on being hardlinks and/or having same xattr checksum.

Will be fixed in next point release

intelfx commented 3 years ago

@SeeSpotRun current develop is being being pretty crashy for me (for unrelated reasons). In the meantime (until the point release happens), which commit from develop branch would you recommend to build and use? 1e58df6d211f06812bbe36862b1cdc9512f346ed?

SeeSpotRun commented 3 years ago

which commit from develop branch would you recommend to build and use?

I'd prefer to use the latest and fix whatever is causing:

current develop is being being pretty crashy for me

I'd appreciate details if you can share them. Otherwise the commit you nominated is probably as good as any.

cebtenzzre commented 1 year ago

I'm going to leave this open until a fix is available in master. @intelfx could you confirm that your issue is fixed in develop? Your fix is a status check, but the relevant change in develop appears to be an added group->n_unhashed_clusters > 0 check. And if there are unrelated crashes in develop, could you please open a new issue?

cebtenzzre commented 1 year ago

I was able to reproduce this by using cp -rl on a medium-sized directory and running rmlint -T df,dd --xattr on both copies at once. After the checksums have been written, it crashes every time.

intelfx commented 1 year ago

I got severely sidetracked and forgot about this issue, my apologies.

could you confirm that your issue is fixed in develop?

Frankly, not sure. Right now develop crashes during preprocessing with an g_path_get_dirname: assertion 'file_name != NULL' failed, whereas master sometimes crashes with this bug. Unfortunately I do not have a minimal reproducer for either issue. I'll try to post a report for the develop bug now.