sahib / rmlint

Extremely fast tool to remove duplicates and other lint from your filesystem
http://rmlint.rtfd.org
GNU General Public License v3.0
1.91k stars 132 forks source link

Assertion failure, node->inode != RM_NO_INODE. Lastest develop branch compiled on Raspberry pi 4 #549

Closed james-cook closed 2 years ago

james-cook commented 2 years ago

Version: latest develop branch, --version shows 2.10.1 Platform - Raspberry pi 4 with 4GB RAM

rmlint --progress -S dma -s -1TB --keep-all-tagged DIR1 // DIR2
?¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦?                Traversing (25831 usable files / 4013 + 2 ignored files / folders)
**
ERROR:lib/pathtricia.c:80:rm_node_check_inode: assertion failed: (node->inode != RM_NO_INODE)
ERROR: Aborting due to a fatal error. (signal received: Aborted)
ERROR: Please file a bug report (See rmlint -h)

I went back and recompiled 2.10.1 master to check and it compiles and runs without error with this same command, same directories, same files.

Note: This is a placeholder for the failure and investigation. Hopefully I will have more time next week to recompile and run as advised here: https://github.com/sahib/rmlint/issues/547#issuecomment-1019547768

james-cook commented 2 years ago

Using information from:

Go ahead and open a separate issue for the assertion failure. It would be helpful if you could run rmlint in gdb (gdb --args rmlint ...) and print a backtrace. It seems like it should actually be impossible without some kind of corruption so building with ASAN would also be useful (CFLAGS='-fsanitize=address' LDFLAGS='-fsanitize=address' scons DEBUG=1 ).

This is the run with the recompiled rmlint, using the same command on the same directories and files:

(gdb) run
Starting program: /usr/bin/rmlint --progress -S dma -s -1TB --keep-all-tagged DIR1 // DIR2
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
[New Thread 0xb638c380 (LWP 5790)]
[New Thread 0xb59ff380 (LWP 5791)]
▕░░░░░░░░░░░░░░░░░░░░░░░░░▏                Traversing (25834 usable files / 4013 + 2 ignored files / folders)
[Thread 0xb638c380 (LWP 5790) exited]
**
ERROR:lib/pathtricia.c:80:rm_node_check_inode: assertion failed: (node->inode != RM_NO_INODE)

Thread 1 "rmlint" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0xb6a4df14 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0xb6a39230 in __GI_abort () at abort.c:79
#2  0xb6cbc8a8 in g_assertion_message () at /lib/arm-linux-gnueabihf/libglib-2.0.so.0
#3  0xb6cbc948 in g_assertion_message_expr () at /lib/arm-linux-gnueabihf/libglib-2.0.so.0
#4  0x00020378 in rm_node_check_inode ()
#5  0x00020550 in rm_node_get_inode ()
#6  0x0002082c in rm_file_parent_inode ()
#7  0x00020850 in rm_file_cmp_samefile ()
#8  0x00020a4c in rm_file_cmp_samefile_full ()
#9  0xb6c8e01c in  () at /lib/arm-linux-gnueabihf/libglib-2.0.so.0
(gdb) bt full
#0  0xb6a4df14 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
        set = {__val = {0 <repeats 27 times>, 100, 1, 0, 57, 7}}
        pid = <optimized out>
        tid = <optimized out>
#1  0xb6a39230 in __GI_abort () at abort.c:79
        save_stage = 1
        act =
          {__sigaction_handler = {sa_handler = 0x10, sa_sigaction = 0x10}, sa_mask = {__val = {0, 0, 492432, 557472, 4246540800, 117, 492432, 3204439296, 557472, 117, 0, 509800, 1, 509800, 3066799548, 3067431920, 3070224744, 3067434900, 0, 3070224744, 93, 1962934272, 3066674540, 509800, 0, 0, 4246540800, 509800, 509800, 3067435364, 94, 3070224744}}, sa_flags = 344292, sa_restorer = 0xbeffdd64}
        sigs = {__val = {32, 0 <repeats 31 times>}}
#2  0xb6cbc8a8 in g_assertion_message () at /lib/arm-linux-gnueabihf/libglib-2.0.so.0
#3  0xb6cbc948 in g_assertion_message_expr () at /lib/arm-linux-gnueabihf/libglib-2.0.so.0
#4  0x00020378 in rm_node_check_inode ()
#5  0x00020550 in rm_node_get_inode ()
#6  0x0002082c in rm_file_parent_inode ()
#7  0x00020850 in rm_file_cmp_samefile ()
#8  0x00020a4c in rm_file_cmp_samefile_full ()
#9  0xb6c8e01c in  () at /lib/arm-linux-gnueabihf/libglib-2.0.so.0
(gdb)

ASAN: Compiling with the flags shown: sudo CFLAGS='-fsanitize=address' LDFLAGS='-fsanitize=address' scons DEBUG=1 --prefix=/usr install leads to an error when I run the program in gdb:

(gdb) run
Starting program: /usr/bin/rmlint --progress -S dma -s -1TB --keep-all-tagged DIR1 // DIR2
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
==7199==ASan runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD.
[Inferior 1 (process 7199) exited with code 01]
(gdb)

head of the compile "log":

Scons: Reading SConscript files ...
>> Appending custom build flags : -fsanitize=address
>> Appending custom link flags : -fsanitize=address
Checking whether the C compiler works... yes
Checking for git revision... (cached) yes
Checking for pkg-config... (cached) yes
cebtenzzre commented 2 years ago

ASAN generates its own reports so gdb isn't necessary. Seems like part of rmlint may have been built without ASAN. Try a clean rebuild with those flags:

$ scons -c
$ export CFLAGS='-fsanitize=address' LDFLAGS='-fsanitize=address'
$ scons config
$ scons DEBUG=1
$ sudo -E scons DEBUG=1 --prefix=/usr install

If you get the same error you can work around the issue with LD_PRELOAD=/usr/lib/libasan.so rmlint .... If ASAN reports nothing besides leaks you could also try valgrind on a clean build without ASAN (valgrind rmlint ...).

james-cook commented 2 years ago

Investigating...

Just FYI, from the compiles (not just with the sanitise flags):

scons DEBUG=1
s/timestamp.c
Compiling ==> lib/formats/uniques.c
Compiling ==> lib/fts/fts.c
Building manpage from rst...
Using sphinx-build binary: /usr/bin/sphinx-build
Linking Static Library ==> librmlint.a
Ranlib Library ==> librmlint.a
Linking Program ==> rmlint
/usr/bin/ld: librmlint.a(reflink.o): in function `rm_dedupe_main':
reflink.c:(.text+0x249c): warning: lchmod is not implemented and will always fail
Cannot import `sphinx_bootstrap_theme`; falling back to `nature`.
^ This is no error, will cause only slightly different html output.
Zipping manpage...
scons: done building targets.

Not sure if the rm_dedupe_main - reflink.c - lchmod warning is important.

Raspberry pi does not have libasan at the location you mentioned.

I found it at /usr/lib/gcc/arm-linux-gnueabihf/8/libasan.so (assuming this is the correct libasan for gcc 8) (it's the only libasan.so under /usr)

Is it OK just to link in the dynamic lib and run as shown below or must I install libasan5 explicitly?

LD_PRELOAD=/usr/lib/gcc/arm-linux-gnueabihf/8/libasan.so rmlint --progress -S dma -s -1TB --keep-all-tagged DIR1 // DIR2
=================================================================
==10047==ERROR: AddressSanitizer: heap-buffer-overflow on address 0xb2a03b80 at pc 0x00082e00 bp 0xb05fc20c sp 0xb05fc204
READ of size 8 at 0xb2a03b80 thread T2 (pool)
    #0 0x82dff in rm_file_new (/usr/bin/rmlint+0x82dff)
    #1 0x4d017 in rm_traverse_file (/usr/bin/rmlint+0x4d017)
    #2 0x4f503 in rm_traverse_directory (/usr/bin/rmlint+0x4f503)
    #3 0x36ad7 in rm_mds_factory (/usr/bin/rmlint+0x36ad7)

0xb2a03b80 is located 8 bytes to the right of 88-byte region [0xb2a03b20,0xb2a03b78)
allocated by thread T2 (pool) here:
    #0 0xb6a8bbbb in __interceptor_malloc (/usr/lib/gcc/arm-linux-gnueabihf/8/libasan.so+0xe1bbb)
    #1 0x79c57 in fts_alloc (/usr/bin/rmlint+0x79c57)

Thread T2 (pool) created by T0 here:
    #0 0xb69f59c7 in pthread_create (/usr/lib/gcc/arm-linux-gnueabihf/8/libasan.so+0x4b9c7)
    #1 0xb66be523  (/lib/arm-linux-gnueabihf/libglib-2.0.so.0+0x9c523)

SUMMARY: AddressSanitizer: heap-buffer-overflow (/usr/bin/rmlint+0x82dff) in rm_file_new
Shadow bytes around the buggy address:
  0x36540720: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
  0x36540730: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
  0x36540740: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 05
  0x36540750: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
  0x36540760: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
=>0x36540770:[fa]fa fa fa fd fd fd fd fd fd fd fd fd fd fd fa
  0x36540780: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fa
  0x36540790: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
  0x365407a0: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
  0x365407b0: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
  0x365407c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==10047==ABORTING

This doesn't look like the original error though (?) - I don't see the initial traversal output on screen. And, as you mention, these are only leaks.

cebtenzzre commented 2 years ago

I am able to reproduce the heap-buffer-overflow report with a 32-bit x86 build, but not on x86_64, so I suspect some errors in RM_PLATFORM_32 related code. Will debug further, thanks.

cebtenzzre commented 2 years ago

Try patching lib/config.h.in like this and rebuilding:

diff --git a/lib/config.h.in b/lib/config.h.in
index e9a5a3c0..30fda4e2 100644
--- a/lib/config.h.in
+++ b/lib/config.h.in
@@ -57,6 +57,7 @@
 #define LLI G_GINT64_FORMAT

+#include <stdint.h> /* for UINTPTR_MAX */
 #define RM_PLATFORM_32 (UINTPTR_MAX == 0xffffffff)
 #define RM_PLATFORM_64 (UINTPTR_MAX == 0xffffffffffffffff)
james-cook commented 2 years ago

I can confirm that the patch fixes the assertion failure on my platform. Thanks :)

james-cook commented 2 years ago

Closing. Please re-open if needed.