hasse69 / rar2fs

FUSE file system for reading RAR archives
https://hasse69.github.io/rar2fs/
GNU General Public License v3.0
272 stars 25 forks source link

rar2fs coredump - Signal 6 (ABRT) #160

Closed hrxcodes closed 3 years ago

hrxcodes commented 3 years ago

Issue

rar2fs is randomly stopped, I've managed to get the core dump of one of these failures. rar2fs has been working fine for the last 12+months before this issue started happening unrelated to any updates as far as I could tell.

Does the core dump tell you anything interesting? Memory at the time was as follows:

Total in system: 31.21GiB
Used: 9.37GiB
Cache + Buffer: 21.55GiB
Free: 293.58MiB

Since I'm seeing malloc in the core dump I started to suspect it's memory related but there is plenty of cache to take from. I'm kind of stuck so any ideas are welcome 😃 Thank you for your work on rar2fs!

Versions

ldd

ldd /usr/local/bin/rar2fs
    linux-vdso.so.1 (0x00007ffd94bf6000)
    libfuse.so.2 => /usr/lib/libfuse.so.2 (0x00007f829aa0f000)
    libdl.so.2 => /usr/lib/libdl.so.2 (0x00007f829aa08000)
    librt.so.1 => /usr/lib/librt.so.1 (0x00007f829a9fd000)
    libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f829a820000)
    libm.so.6 => /usr/lib/libm.so.6 (0x00007f829a6db000)
    libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007f829a6c1000)
    libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f829a69e000)
    libc.so.6 => /usr/lib/libc.so.6 (0x00007f829a4d1000)
    /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f829aac7000)

Logs

dmesg

Mar 17 11:10:44.909223 host kernel: audit: type=1701 audit(1615975844.897:76578): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=3753864 comm="rar2fs" exe="/usr/local/bin/rar2fs" sig=6 res=1
Mar 17 11:10:44.926067 host kernel: audit: type=1334 audit(1615975844.920:76579): prog-id=43 op=LOAD
Mar 17 11:10:44.926110 host kernel: audit: type=1334 audit(1615975844.920:76580): prog-id=44 op=LOAD
Mar 17 11:10:44.926128 host kernel: audit: type=1130 audit(1615975844.920:76581): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@3-2056350-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Mar 17 11:10:45.256811 host kernel: audit: type=1131 audit(1615975845.250:76582): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@3-2056350-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'

journal

Mar 16 08:04:35 host systemd[1]: media-storage-stuff-level1.automount: Got automount request for /media/storage/stuff/level1, triggered by 3753845 (ls)
Mar 17 11:10:44 host systemd[1]: media-storage-stuff-level1.automount: Got hangup/error on autofs pipe from kernel. Likely our automount point has been unmounted by someone or something else?
Mar 17 11:10:44 host systemd[1]: media-storage-stuff-level1.automount: Failed with result 'unmounted'.

coredump

           PID: 3753864 (rar2fs)
           UID: 0 (root)
           GID: 0 (root)
        Signal: 6 (ABRT)
     Timestamp: Wed 2021-03-17 11:10:44 CET (46min ago)
  Command Line: rar2fs /media/storage/stuff/level1 /media/storage/stuff/mount -o rw,allow_other,dev,suid
    Executable: /usr/local/bin/rar2fs
 Control Group: /system.slice/media-storage-stuff-mount.mount
          Unit: media-storage-stuff-mount.mount
         Slice: system.slice
       Boot ID: 03494dfcd5f54869a9f63d8d387dcaae
       Storage: /var/lib/systemd/coredump/core.rar2fs.0.03494dfcd5f54869a9f63d8d387dcaae.3753864.1615975844000000.zst
       Message: Process 3753864 (rar2fs) of user 0 dumped core.

                Stack trace of thread 3950980:
                #0  0x00007f1be01d7ef5 raise (libc.so.6 + 0x3cef5)
                #1  0x00007f1be01c1862 abort (libc.so.6 + 0x26862)
                #2  0x00007f1be0219f38 __libc_message (libc.so.6 + 0x7ef38)
                #3  0x00007f1be0221bea malloc_printerr (libc.so.6 + 0x86bea)
                #4  0x00007f1be02236c8 _int_free (libc.so.6 + 0x886c8)
                #5  0x00007f1be0226ca8 __libc_free (libc.so.6 + 0x8bca8)
                #6  0x0000560ddf2d62d8 __resolve_dir (rar2fs + 0x1d2d8)
                #7  0x0000560ddf2d6a3e readdir_scan (rar2fs + 0x1da3e)
                #8  0x00007f1be06e7d9b n/a (libfuse.so.2 + 0xed9b)
                #9  0x00007f1be06ef6b2 n/a (libfuse.so.2 + 0x166b2)
                #10 0x00007f1be06f07c4 n/a (libfuse.so.2 + 0x177c4)
                #11 0x00007f1be06ed3f0 n/a (libfuse.so.2 + 0x143f0)
                #12 0x00007f1be0371299 start_thread (libpthread.so.0 + 0x9299)
                #13 0x00007f1be029a053 __clone (libc.so.6 + 0xff053)

                Stack trace of thread 3753865:
                #0  0x00007f1be037d9ba __futex_abstimed_wait_common64 (libpthread.so.0 + 0x159ba)
                #1  0x00007f1be0379b98 __new_sem_wait_slow64.constprop.0 (libpthread.so.0 + 0x11b98)
                #2  0x00007f1be06ed639 fuse_session_loop_mt (libfuse.so.2 + 0x14639)
                #3  0x00007f1be06f31ab fuse_loop_mt (libfuse.so.2 + 0x1a1ab)
                #4  0x0000560ddf2cf3fb work_task (rar2fs + 0x163fb)
                #5  0x00007f1be0371299 start_thread (libpthread.so.0 + 0x9299)
                #6  0x00007f1be029a053 __clone (libc.so.6 + 0xff053)

                Stack trace of thread 3753866:
                #0  0x00007f1be037a87c read (libpthread.so.0 + 0x1287c)
                #1  0x00007f1be06ecd12 n/a (libfuse.so.2 + 0x13d12)
                #2  0x00007f1be06ee8ae n/a (libfuse.so.2 + 0x158ae)
                #3  0x00007f1be06ed372 n/a (libfuse.so.2 + 0x14372)
                #4  0x00007f1be0371299 start_thread (libpthread.so.0 + 0x9299)
                #5  0x00007f1be029a053 __clone (libc.so.6 + 0xff053)

                Stack trace of thread 3780051:
                #0  0x00007f1be037a87c read (libpthread.so.0 + 0x1287c)
                #1  0x00007f1be06ecd12 n/a (libfuse.so.2 + 0x13d12)
                #2  0x00007f1be06ee8ae n/a (libfuse.so.2 + 0x158ae)
                #3  0x00007f1be06ed372 n/a (libfuse.so.2 + 0x14372)
                #4  0x00007f1be0371299 start_thread (libpthread.so.0 + 0x9299)
                #5  0x00007f1be029a053 __clone (libc.so.6 + 0xff053)

                Stack trace of thread 3754421:
                #0  0x00007f1be037a87c read (libpthread.so.0 + 0x1287c)
                #1  0x00007f1be06ecd12 n/a (libfuse.so.2 + 0x13d12)
                #2  0x00007f1be06ee8ae n/a (libfuse.so.2 + 0x158ae)
                #3  0x00007f1be06ed372 n/a (libfuse.so.2 + 0x14372)
                #4  0x00007f1be0371299 start_thread (libpthread.so.0 + 0x9299)
                #5  0x00007f1be029a053 __clone (libc.so.6 + 0xff053)

                Stack trace of thread 3753864:
                #0  0x00007f1be0262125 clock_nanosleep@@GLIBC_2.17 (libc.so.6 + 0xc7125)
                #1  0x00007f1be0267357 __nanosleep (libc.so.6 + 0xcc357)
                #2  0x00007f1be026728e sleep (libc.so.6 + 0xcc28e)
                #3  0x0000560ddf2d0224 work (rar2fs + 0x17224)
                #4  0x0000560ddf2c9b73 main (rar2fs + 0x10b73)
                #5  0x00007f1be01c2b25 __libc_start_main (libc.so.6 + 0x27b25)
                #6  0x0000560ddf2c9fde _start (rar2fs + 0x10fde)

                Stack trace of thread 3753867:
                #0  0x00007f1be037a87c read (libpthread.so.0 + 0x1287c)
                #1  0x00007f1be06ecd12 n/a (libfuse.so.2 + 0x13d12)
                #2  0x00007f1be06ee8ae n/a (libfuse.so.2 + 0x158ae)
                #3  0x00007f1be06ed372 n/a (libfuse.so.2 + 0x14372)
                #4  0x00007f1be0371299 start_thread (libpthread.so.0 + 0x9299)
                #5  0x00007f1be029a053 __clone (libc.so.6 + 0xff053)

                Stack trace of thread 2056332:
                #0  0x00007f1be037a87c read (libpthread.so.0 + 0x1287c)
                #1  0x00007f1be06ecd12 n/a (libfuse.so.2 + 0x13d12)
                #2  0x00007f1be06ee8ae n/a (libfuse.so.2 + 0x158ae)
                #3  0x00007f1be06ed372 n/a (libfuse.so.2 + 0x14372)
                #4  0x00007f1be0371299 start_thread (libpthread.so.0 + 0x9299)
                #5  0x00007f1be029a053 __clone (libc.so.6 + 0xff053)
hasse69 commented 3 years ago

Does not really look good, does it? ☹️ I thought I caught all these bugs already but apparently not. Any idea how to recreate this crash or is it simply random?

hasse69 commented 3 years ago

Good news is that is seems to be isolated to the __resolve_dir() function which is then easier to track down than some asynchronous event and it seems to be a call to free() that is bad. There are a few of those in this function and the one I suspect the most is related to an error case/branch in the code. If this would be a bug in the "normal" flow I think we would have caught it already.

When you try to reproduce. can you please run rar2fs in the foreground using the -f flag? That way we can catch any output from rar2fs to stdout/stderr which might give us a clue to what path is not handled properly.

hasse69 commented 3 years ago

Or even better, run it through gdb if you know how to do that? If you do please run the binary you compiled and not the one you install since the latter would become stripped from any symbols. Note that the -f flag is needed also in this case.

hrxcodes commented 3 years ago

Yeah it showed up from nowhere and happens quite regularly. So while I don't have instructions on how to reproduce, its hows up after a few hours or days (can happen multiple times a day, or work for a few days).

I will let you know as soon as I have something from gdb.

When it stops, do you want just a bt of the current thread or also things like info locals and info threds ?

Notes to self (and possibly others in the future):

sudo gdb ./rar2fs
handle SIG32 pass nostop noprint
r <rar2fs-arguments>
hasse69 commented 3 years ago

A stack trace and possibly a dump of some of the symbols/variables in the context of the crash would be enough for now. The value of the pointer being freed would be good. I suspect it to be a rogue pointer not initialized properly. Otherwise I would have expected some double free or corruption error. Calling free() with NULL is always ok.

hasse69 commented 3 years ago

In case you can reproduce, also try this attached patch on top of master/HEAD. I have my suspicions that patch might solve this issue.

issue160.patch.txt

hrxcodes commented 3 years ago

So far it has been running smooth, but I suspect not for much longer. It's always like this when you want it to happen it doesn't. I'll let you know once it crashes and then try and update to the patch you supplied and run it in gdb again in case it would not solve the issue.

hasse69 commented 3 years ago

No news here? Still no crash to work with?

hrxcodes commented 3 years ago

Annoying as it is it's suddenly working as expected, no crashes so far since I started it. I'm keeping an eye on it every day but it's solid so far even without the patch. I suggest we keep this open another week and if it does not happen we can close it. The longest I had it running for before was 12 days (been 11 days since I started it in gdb).

It's just annoying it can't be reproduced from having issues multiple times a day/week -> creates issue -> can't reproduce 😞 I will keep you updated as soon as there is something happening on my end

hasse69 commented 3 years ago

Let's just keep it open then.

When I studied the actual crash signature there are not many places it would blow up like this. I think the patch provided would cover that. It would be a rather rare situation for this to happen so you must have been very unlucky or lucky to spot it. Never the less, the problem is/was there. Just need to wait for it.

hrxcodes commented 3 years ago

Still working perfectly fine curiously enough. I'll keep it running in gdb until it either crashes on me or I need to reboot the box.

I will get back when/if something bad happens, you may close the issue in the meanwhile if you want. Thank you for your work on rar2fs, it's great! 🍻

hasse69 commented 3 years ago

Despite the fact you have not been able to reproduce the issue I have chosen to merge the patch since it does no harm really. Let's close this issue but feel free to re-open when/if needed.