Transport endpoint is not connected

magma1447 commented 3 years ago

The below seems to happen to me at least once per day. I can just umount and mount it again though. This started about two weeks ago for me, before that I have been running this awesome software for years without issues.

ls: cannot access '/mnt/media/movies': Transport endpoint is not connected

After I started to have issues I upgraded from 1.27.0 to 1.29.3 (latest).

I found and read through Issue #151 and received some ideas of what I can do to provide proper feedback. Tonight it crashed while I had it running as a foreground process, the backtrace looks like this:

*** Error in `rar2fs': corrupted size vs. prev_size: 0x00007f04c87fbc10 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x70bfb)[0x7f04d5c47bfb]
/lib/x86_64-linux-gnu/libc.so.6(+0x76fc6)[0x7f04d5c4dfc6]
/lib/x86_64-linux-gnu/libc.so.6(+0x773b8)[0x7f04d5c4e3b8]
/lib/x86_64-linux-gnu/libc.so.6(+0x78dfa)[0x7f04d5c4fdfa]
/lib/x86_64-linux-gnu/libc.so.6(__libc_malloc+0x54)[0x7f04d5c51f64]
/lib/x86_64-linux-gnu/libc.so.6(_IO_file_doallocate+0x72)[0x7f04d5c3da62]
/lib/x86_64-linux-gnu/libc.so.6(_IO_doallocbuf+0x46)[0x7f04d5c4ba76]
/lib/x86_64-linux-gnu/libc.so.6(_IO_file_seekoff+0x173)[0x7f04d5c48e13]
/lib/x86_64-linux-gnu/libc.so.6(fseeko64+0x79)[0x7f04d5c472b9]
rar2fs(+0xfb4d)[0x556cd3a97b4d]
/lib/x86_64-linux-gnu/libfuse.so.2(fuse_fs_read_buf+0x1d2)[0x7f04d709ce12]
/lib/x86_64-linux-gnu/libfuse.so.2(+0xcfd2)[0x7f04d709cfd2]
/lib/x86_64-linux-gnu/libfuse.so.2(+0x15a3e)[0x7f04d70a5a3e]
/lib/x86_64-linux-gnu/libfuse.so.2(+0x165a1)[0x7f04d70a65a1]
/lib/x86_64-linux-gnu/libfuse.so.2(+0x12d48)[0x7f04d70a2d48]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4)[0x7f04d5f7d4a4]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f04d5cbfd0f]

The process was started like this: rar2fs -f -oro,allow_other --seek-length=1 <src> /mnt/media/movies

I don't know what's triggering it yet. It seems to always crash during the night and I suspected updatedb, but /mnt is included in my PRUNEPATHS.

For the record, there is a fuse.mergerfs below it. Four md-raids are mounted in /mnt/md?, then those are merged into /data, and then a subdir of /data is mounted with rar2fs.

I will recompile rar2fs with --enable-debug=1 and should have an update with debug data within 24 hours.

hasse69 commented 3 years ago

Thanks for the issue report. Surely does not look good, there is some heap corruption happening here. Judging from your stack trace it happens while reading from a file inside one of your archives (_fuse_fs_readbuf) and it is an archive in store mode since fseeko() is called (fseeko64) and which is crashing on us. I will re-run valgrind again to see if I can make it spot something. I am rather sure I have already done that but I might have missed some corner case of course.

hasse69 commented 3 years ago

Also, if it is not too much trouble, I think it might be a good idea to reduce the scope of the potential changes that might be causing this new issue. The delta between 1.27.0 and 1.29.3 is rather huge so I think it would be helpful in the troubleshooting to see when exactly the problems started by upgrading in lesser steps.

magma1447 commented 3 years ago

Also, if it is not too much trouble, I think it might be a good idea to reduce the scope of the potential changes that might be causing this new issue. The delta between 1.27.0 and 1.29.3 is rather huge so I think it would be helpful in the troubleshooting to see when exactly the problems started by upgrading in lesser steps.

Maybe you misunderstood me. I started having issues with 1.27.0 (which I have been using for a long time). I then upgraded to 1.29.3 to see if that fixed the issue, it didn't. Therefore I don't see any point in downgrading again, though I definitely could if you wish that. It's done in a minute (or two).

I suspect that it's something external that is causing this, like broken rars for example. On the other hand, I don't know the code base at all, so I am just guessing wildly.

hasse69 commented 3 years ago

Yes I definitely misunderstood you :( So we have a case here that you had a stable running environment using 1.27.0 that suddenly stopped working and an upgrade did not help. Without having more details this looks very suspiciously similar to issue #98. The problems discovered in that issue were supposed to have been resolved in v1.27.1. Maybe some change made there accidentality broke by some later commits. If I would try anything it would be to upgrade to 1.27.1 and confirm that the problem still persists. If it does then I am rather sure we have bug here that has been playing hide-and-seek for a long time.

magma1447 commented 3 years ago

I have now downgraded to 1.27.1. It's not a bad idea since the crashes I experience with 1.27.0 vs 1.29.3 could be different issues as well. And if so, we are getting fooled by the fact that it was working for years.

Regardless I hope that we will get some more information upon next crash since I am now running it with --configure-debug=1 and also with -d in the command line. The best case would be if I could manually reproduce it after knowing when it crashes.

hasse69 commented 3 years ago

What we can hope for is that the crash signature is very similar each time and that the extra debug information you are now collecting can give us a hint if it points towards a specific archive or anything else that we can continue to work with. I will also continue from my end to see if I can spot something obvious.

magma1447 commented 3 years ago

So it died tonight again, as expected. Command line used was: /usr/local/bin/rar2fs -d -oro,allow_other --seek-length=1 <src> /mnt/media/movies/ ; echo ; date With that I also got a timestamp: Sat Jan 9 02:07:35 CET 2021

The output and backtrace:

rar2_getattr()
   NODEID: 459782
   unique: 8508980, success, outsize: 144
unique: 8508981, opcode: OPEN (14), nodeid: 459782, insize: 48, pid: 4988
open flags: 0x8000 /my/censored/filename.flv
rar2_open()
   open[139675967973376] flags: 0x8000 /my/censored/filename.flv
   unique: 8508981, success, outsize: 32
unique: 8508982, opcode: READ (15), nodeid: 459782, insize: 80, pid: 4988
read[139675967973376] 131072 bytes from 0 flags: 0x8000
rar2_read()
get_vname()
   read[139675967973376] 131072 bytes from 0
   unique: 8508982, success, outsize: 131088
unique: 8508983, opcode: READ (15), nodeid: 459782, insize: 80, pid: 4988
read[139675967973376] 69632 bytes from 1210671104 flags: 0x8000
rar2_read()
get_vname()
*** Error in `/usr/local/bin/rar2fs': corrupted size vs. prev_size: 0x00007f08d004acc0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x70bfb)[0x7f08ebd8ebfb]
/lib/x86_64-linux-gnu/libc.so.6(+0x76fc6)[0x7f08ebd94fc6]
/lib/x86_64-linux-gnu/libc.so.6(+0x773b8)[0x7f08ebd953b8]
/lib/x86_64-linux-gnu/libc.so.6(+0x78dfa)[0x7f08ebd96dfa]
/lib/x86_64-linux-gnu/libc.so.6(__libc_malloc+0x54)[0x7f08ebd98f64]
/lib/x86_64-linux-gnu/libc.so.6(_IO_file_doallocate+0x72)[0x7f08ebd84a62]
/lib/x86_64-linux-gnu/libc.so.6(_IO_doallocbuf+0x46)[0x7f08ebd92a76]
/lib/x86_64-linux-gnu/libc.so.6(_IO_file_seekoff+0x173)[0x7f08ebd8fe13]
/lib/x86_64-linux-gnu/libc.so.6(fseeko64+0x79)[0x7f08ebd8e2b9]
/usr/local/bin/rar2fs(+0xd766)[0x55d739e89766]
/usr/local/bin/rar2fs(+0x15783)[0x55d739e91783]
/lib/x86_64-linux-gnu/libfuse.so.2(fuse_fs_read_buf+0x1d2)[0x7f08ecedfe12]
/lib/x86_64-linux-gnu/libfuse.so.2(+0xcfd2)[0x7f08ecedffd2]
/lib/x86_64-linux-gnu/libfuse.so.2(+0x15a3e)[0x7f08ecee8a3e]
/lib/x86_64-linux-gnu/libfuse.so.2(+0x165a1)[0x7f08ecee95a1]
/lib/x86_64-linux-gnu/libfuse.so.2(+0x12d48)[0x7f08ecee5d48]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4)[0x7f08ec0c44a4]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f08ebe06d0f]

The file triggering this exists within a rar. rar t on that/those rar's work (no errors).

If remounting rar2fs I can reproduce the issue actually. A ls works fine, a md5sum on the file crashes the rar2fs process so that it ends with:

rar2_open()
   open[139898963204992] flags: 0x8000 /my/censored/filename.flv
   unique: 13, success, outsize: 32
unique: 14, opcode: READ (15), nodeid: 5, insize: 80, pid: 4549
read[139898963204992] 65536 bytes from 0 flags: 0x8000
rar2_read()
get_vname()
   read[139898963204992] 65536 bytes from 0
   unique: 14, success, outsize: 65552
unique: 15, opcode: READ (15), nodeid: 5, insize: 80, pid: 4549
read[139898963204992] 131072 bytes from 65536 flags: 0x8000
rar2_read()
   read[139898963204992] 131072 bytes from 65536
   unique: 15, success, outsize: 131088
unique: 16, opcode: READ (15), nodeid: 5, insize: 80, pid: 4549
read[139898963204992] 131072 bytes from 196608 flags: 0x8000
rar2fs: malloc.c:2406: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed.
Aborted

I then did a cp -ar on that data into a new directory. Then it works. If it hadn't worked like that, I could probably have sent you the data so that you could reproduce it locally. That it doesn't work makes me wonder. Maybe I should run a fsck.ext4? I have read the rar files multiple times without issues now though.

magma1447 commented 3 years ago

I also found this in dmesg today. But from multiple crashes there is only one occurrence.

[1855630.978895] traps: rar2fs[4325] trap stack segment ip:7f86915ea4ea sp:7f8690a69fd0 error:0
[1855630.978900]  in libc-2.24.so[7f8691571000+195000]

hasse69 commented 3 years ago

I think we need to run this through gdb to get a better stack trace or check if you can "unlimit" your core dump size so there is a core file generated. It looks weird and I really doubt it would have anything to do with your file system integrity. Also please check your stack size limit. The trap you got looks related to stack somehow.

magma1447 commented 3 years ago

I am heading out, so I just did some fast changes.

# ulimit -a |grep -i stack
stack size              (kbytes, -s) 8192
# ulimit -a |grep -i core
core file size          (blocks, -c) 0
# ulimit -c unlimited
# ulimit -s 16384

Too me it looks pretty much like before. Still crashed, got this dump:

*** Error in `/usr/local/bin/rar2fs': double free or corruption (out): 0x00007f8744050ed0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x70bfb)[0x7f8751bc3bfb]
/lib/x86_64-linux-gnu/libc.so.6(+0x76fc6)[0x7f8751bc9fc6]
/lib/x86_64-linux-gnu/libc.so.6(+0x7780e)[0x7f8751bca80e]
/lib/x86_64-linux-gnu/libc.so.6(fclose+0x132)[0x7f8751bb9c42]
/usr/local/bin/rar2fs(+0xd67b)[0x55840e62867b]
/usr/local/bin/rar2fs(+0x15783)[0x55840e630783]
/lib/x86_64-linux-gnu/libfuse.so.2(fuse_fs_read_buf+0x1d2)[0x7f8752d14e12]
/lib/x86_64-linux-gnu/libfuse.so.2(+0xcfd2)[0x7f8752d14fd2]
/lib/x86_64-linux-gnu/libfuse.so.2(+0x15a3e)[0x7f8752d1da3e]
/lib/x86_64-linux-gnu/libfuse.so.2(+0x165a1)[0x7f8752d1e5a1]
/lib/x86_64-linux-gnu/libfuse.so.2(+0x12d48)[0x7f8752d1ad48]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4)[0x7f8751ef94a4]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f8751c3bd0f]

I'll test with gdb once I get home again. It's a long time ago since I ran gdb. I can just run gdb rar2fs ... ? Then r as in run in the gdb console or something like that right?

hasse69 commented 3 years ago

Well interesting, because now you got something more "heap" related. *** Error in/usr/local/bin/rar2fs': double free or corruption (out): 0x00007f8744050ed0 ***`

Yes, you start rar2fs using

# gdb rar2fs

Then from the gdb prompt you add all your options

(gdb) run <all options to rar2fs>

And later when it halts

(gdb) bt full

hasse69 commented 3 years ago

What we can also try is to run it through valgrind now that you can easily reproduce this.

# sudo valgrind --log-file=valgrind.log --leak-check=full ./src/rar2fs <options>

I cannot find anything in my tests but there might be something in your setup making the code take a different path. If we have heap corruption valgrind should be able to spot it.

hasse69 commented 3 years ago

I think you should also bump to master/HEAD before doing that since I fixed a few minor memory leaks yesterday.

magma1447 commented 3 years ago

First test (valgrind):

Switched to v1.29.3 since I didn't know how to compile the raw source (no configure script anymore).
Mounted it with Valgrind.
md5sum on the same file as before works.
unmounted and mounted without valgrind
It crashed again. However it managed to read the whole file this time (I got the md5-sum, didn't get that before). So the delay randomly a bit delayed.
Mounted with valgrind again, and it works again.

Second test (gdb):

unique: 7, opcode: READ (15), nodeid: 5, insize: 80, pid: 461
read[140737018596560] 65536 bytes from 0 flags: 0x8000
rar2_read()
get_vname()
   read[140737018596560] 0 bytes from 0
   unique: 7, success, outsize: 16
*** Error in `/usr/local/bin/rar2fs': free(): invalid next size (fast): 0x00007fffe8010c90 ***
[New Thread 0x7fffeeffe700 (LWP 462)]
unique: 8, opcode: FLUSH (25), nodeid: 5, insize: 64, pid: 461
flush[140737018596560]
rar2_flush()
lflush()
   unique: 8, success, outsize: 16
unique: 9, opcode: RELEASE (18), nodeid: 5, insize: 64, pid: 0
release[140737018596560] flags: 0x8000
rar2_release()
======= Backtrace: =========
Fatal error: glibc detected an invalid stdio handle
/lib/x86_64-linux-gnu/libc.so.6(+0x70bfb)[0x7ffff6752bfb]
/lib/x86_64-linux-gnu/libc.so.6(+0x76fc6)[0x7ffff6758fc6]
/lib/x86_64-linux-gnu/libc.so.6(+0x7780e)[0x7ffff675980e]
/lib/x86_64-linux-gnu/libfuse.so.2(+0xd04e)[0x7ffff7ba804e]
/lib/x86_64-linux-gnu/libfuse.so.2(+0x15a3e)[0x7ffff7bb0a3e]
/lib/x86_64-linux-gnu/libfuse.so.2(+0x165a1)[0x7ffff7bb15a1]
/lib/x86_64-linux-gnu/libfuse.so.2(+0x12d48)[0x7ffff7badd48]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4)[0x7ffff6a884a4]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7ffff67cad0f]

...

Thread 5 "rar2fs" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffeeffe700 (LWP 462)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt full
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
        set = {__val = {16391, 0 <repeats 15 times>}}
        pid = <optimized out>
        tid = <optimized out>
#1  0x00007ffff671642a in __GI_abort () at abort.c:89
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, sa_mask = {__val = {0 <repeats 14 times>, 52, 
              140737203133248}}, sa_flags = 52, sa_restorer = 0x7fffeeffda60}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x00007ffff6752c00 in __libc_message (do_abort=do_abort@entry=1, fmt=fmt@entry=0x7ffff6846374 "%s")
    at ../sysdeps/posix/libc_fatal.c:175
        ap = {{gp_offset = 24, fp_offset = 0, overflow_arg_area = 0x7fffeeffdb50, reg_save_area = 0x7fffeeffdae0}}
        fd = 7
        on_2 = <optimized out>
        list = <optimized out>
        nlist = <optimized out>
        cp = <optimized out>
        written = <optimized out>
#3  0x00007ffff6752c42 in __GI___libc_fatal (message=message@entry=0x7ffff6847f10 "Fatal error: glibc detected an invalid stdio handle\n")
    at ../sysdeps/posix/libc_fatal.c:185
No locals.
#4  0x00007ffff67533c6 in _IO_vtable_check () at vtables.c:71
        flag = <optimized out>
#5  0x00007ffff67551c5 in IO_validate_vtable (vtable=0x10000) at libioP.h:942
        section_length = <optimized out>
        ptr = 0x10000 <error: Cannot access memory at address 0x10000>
        offset = <optimized out>
#6  _IO_new_file_close_it (fp=fp@entry=0x7fffe8010bd0) at fileops.c:180
        write_status = 0
        close_status = <optimized out>
#7  0x00007ffff6748ccf in _IO_new_fclose (fp=0x7fffe8010bd0) at iofclose.c:58
        _IO_acquire_lock_file = 0x7fffe8010bd0
        status = <optimized out>
#8  0x0000555555568c55 in rar2_release (path=0x0, fi=0x7fffeeffdd00) at rar2fs.c:4415
        op = 0x7fffe4000cf0
        __func__ = "rar2_release"
#9  0x00007ffff7ba7582 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
No symbol table info available.
#10 0x00007ffff7ba9e33 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
No symbol table info available.
#11 0x00007ffff7bb08dc in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
No symbol table info available.
#12 0x00007ffff7bb15a1 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
No symbol table info available.
#13 0x00007ffff7badd48 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
No symbol table info available.
#14 0x00007ffff6a884a4 in start_thread (arg=0x7fffeeffe700) at pthread_create.c:456
        __res = <optimized out>
        pd = 0x7fffeeffe700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737203136256, -2633863115444502460, 140737219914862, 140737219914863,
                140737186357248, 3, 2633900497759453252, 2633847267168567364}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0},
            data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
---Type <return> to continue, or q <return> to quit---
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#15 0x00007ffff67cad0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
No locals.

PS! The system is a Debian oldstable. Fully upgraded besides three packages which aren't relevant.

hasse69 commented 3 years ago

I cannot make much sense out of this :( It seems gdb did not stop when I expected it to, and thus it is rather difficult to see in what context the free() is failing, Can you please recompile using debug level 3 instead.

That it later crashes in release is probably due to that a file handle is closed that for some reason seems to be closed already, maybe related to the first error.

Also can you please try mounting using the -s flag to see if the crash still happens, or at least that gdb might get less confused due to multiple threads.

hasse69 commented 3 years ago

Also the valgrind log would be nice if you could attach. When you mount using valgrind you still must make sure that rar2fs is executed in the foreground using -d or -f so that you can easily break out using CTRL-C. That in order to allow for valgrind to terminate and create the log file.

hasse69 commented 3 years ago

From what I can tell the only free() that could fail with this error is the one called after calling get_vname(). The latter is creating a copy of the input string and if something has gone wrong in the filename analysis I guess this can be a source of error leading to buffer overrun etc. It it thus vital that we can see a trace from a compile using --enable-debug=3.

magma1447 commented 3 years ago

Here is some output from when using debug=3. Interesting enough I don't know why it's touching the mentioned file, that's not the one I was hashing.

Opening /source-path/some-dirs/another-file-than-the-one-I-was-hashing.rar
SEEK src_off = 89, VOL_REAL_SZ = 49999980
size = 65536, chunk = 49999891
Read 65536 bytes from vol=0, base=0
   read[139838699343056] 0 bytes from 0
   unique: 7, success, outsize: 16
*** Error in `/usr/local/bin/rar2fs': free(): invalid next size (fast): 0x00007f2eb8010c90 ***
unique: 8, opcode: FLUSH (25), nodeid: 5, insize: 64, pid: 17360
flush[139838699343056]
rar2_flush()
(17351) FLUSH [0x7f2ebc000cf0  ][called from 17360]
lflush()
   unique: 8, success, outsize: 16
======= Backtrace: =========
unique: 9, opcode: RELEASE (18), nodeid: 5, insize: 64, pid: 0
/lib/x86_64-linux-gnu/libc.so.6(+0x70bfb)[0x7f2ec819dbfb]
release[139838699343056] flags: 0x8000
rar2_release()
(17351) RELEASE [0x7f2ebc000cd0    ]
Closing file handle 0x7f2eb8010bd0
/lib/x86_64-linux-gnu/libc.so.6(+0x76fc6)[0x7f2ec81a3fc6]
Fatal error: glibc detected an invalid stdio handle
/lib/x86_64-linux-gnu/libc.so.6(+0x7780e)[0x7f2ec81a480e]
/lib/x86_64-linux-gnu/libfuse.so.2(+0xd04e)[0x7f2ec95f304e]
Aborted (core dumped)

I then ran it with gdb again, and added -s. The backtrace might look a bit different.

unique: 9246, opcode: FLUSH (25), nodeid: 5, insize: 64, pid: 18297
flush[140737220159312]
rar2_flush()
(18289) FLUSH [0x7ffff0041920  ][called from 18297]
lflush()
   unique: 9246, success, outsize: 16
unique: 9247, opcode: RELEASE (18), nodeid: 5, insize: 64, pid: 0
release[140737220159312] flags: 0x8000
rar2_release()
(18289) RELEASE [0x7ffff003a750    ]
Closing file handle 0x7ffff0031a70
(18289) FREE [0x7ffff003a750    ]
*** Error in `/usr/local/bin/rar2fs': free(): invalid next size (fast): 0x00007ffff0031b30 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x70bfb)[0x7ffff6752bfb]
/lib/x86_64-linux-gnu/libc.so.6(+0x76fc6)[0x7ffff6758fc6]
/lib/x86_64-linux-gnu/libc.so.6(+0x7780e)[0x7ffff675980e]
/usr/local/bin/rar2fs(filecache_freeclone+0x1b)[0x55555555bde3]
/usr/local/bin/rar2fs(+0x1558a)[0x55555556958a]
/lib/x86_64-linux-gnu/libfuse.so.2(+0xc582)[0x7ffff7ba7582]
/lib/x86_64-linux-gnu/libfuse.so.2(+0xee33)[0x7ffff7ba9e33]
/lib/x86_64-linux-gnu/libfuse.so.2(+0x158dc)[0x7ffff7bb08dc]
/lib/x86_64-linux-gnu/libfuse.so.2(+0x165a1)[0x7ffff7bb15a1]
/lib/x86_64-linux-gnu/libfuse.so.2(fuse_session_loop+0xbc)[0x7ffff7bad94c]
/lib/x86_64-linux-gnu/libfuse.so.2(fuse_loop+0x208)[0x7ffff7ba5c08]
/usr/local/bin/rar2fs(+0x1742b)[0x55555556b42b]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4)[0x7ffff6a884a4]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7ffff67cad0f]

...

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
        set = {__val = {0, 2319406791620833328, 2319389199435444272, 2314885530818453536, 2314885530818453536, 2314885530818453536, 
            6731583338252032800, 7378697629483820554, 3472328296331896422, 7378697629483806000, 3472609797883717222, 2337500343188860976, 
            3472328296227680304, 3467824696768081952, 2314885530818453536, 2314885530818453536}}
        pid = <optimized out>
        tid = <optimized out>
#1  0x00007ffff671642a in __GI_abort () at abort.c:89
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0x6666370a20302030, sa_sigaction = 0x6666370a20302030}, sa_mask = {__val = {
              3472328529062356582, 3991990709698112816, 8223625903104026214, 3472328295963634989, 4195155967701168176, 
              3617569414307656034, 2314885530818453560, 2314885530818453536, 8444003148692987936, 8660248813382890099, 
              7596496373740942904, 3419760881481315694, 7162745918061503335, 8459008514813488751, 7521962881247831404, 140737327794672}}, 
          sa_flags = 95, sa_restorer = 0x7ffff66e09f0}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x00007ffff6752c00 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7ffff6847fd0 "*** Error in `%s': %s: 0x%s ***\n")
    at ../sysdeps/posix/libc_fatal.c:175
        ap = {{gp_offset = 40, fp_offset = 21845, overflow_arg_area = 0x7ffff66e0a00, reg_save_area = 0x7ffff66e0990}}
        fd = 5
        on_2 = <optimized out>
        list = <optimized out>
        nlist = <optimized out>
        cp = <optimized out>
        written = <optimized out>
#3  0x00007ffff6758fc6 in malloc_printerr (action=3, str=0x7ffff68480e0 "free(): invalid next size (fast)", ptr=<optimized out>, 
    ar_ptr=<optimized out>) at malloc.c:5049
        buf = "00007ffff0031b30"
        cp = <optimized out>
        ar_ptr = <optimized out>
        ptr = <optimized out>
        str = 0x7ffff68480e0 "free(): invalid next size (fast)"
        action = 3
#4  0x00007ffff675980e in _int_free (av=0x7ffff0000020, p=0x7ffff0031b20, have_lock=0) at malloc.c:3905
        size = <optimized out>
        fb = <optimized out>
        nextchunk = <optimized out>
        nextsize = <optimized out>
        nextinuse = <optimized out>
        prevsize = <optimized out>
        bck = <optimized out>
        fwd = <optimized out>
        errstr = <optimized out>
        locked = <optimized out>
        __func__ = "_int_free"
#5  0x000055555555bde3 in filecache_freeclone (dest=0x7ffff00419f0) at filecache.c:185
No locals.
#6  0x000055555556958a in rar2_release (path=0x0, fi=0x7ffff66e0bf0) at rar2fs.c:4449
        op = 0x7ffff0041920
        __func__ = "rar2_release"
#7  0x00007ffff7ba7582 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
No symbol table info available.
#8  0x00007ffff7ba9e33 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
No symbol table info available.
#9  0x00007ffff7bb08dc in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
No symbol table info available.
---Type <return> to continue, or q <return> to quit---
#10 0x00007ffff7bb15a1 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
No symbol table info available.
#11 0x00007ffff7bad94c in fuse_session_loop () from /lib/x86_64-linux-gnu/libfuse.so.2
No symbol table info available.
#12 0x00007ffff7ba5c08 in fuse_loop () from /lib/x86_64-linux-gnu/libfuse.so.2
No symbol table info available.
#13 0x000055555556b42b in work_task (data=0x7fffffffe330) at rar2fs.c:5189
        wdt = 0x7fffffffe330
#14 0x00007ffff6a884a4 in start_thread (arg=0x7ffff66e1700) at pthread_create.c:456
        __res = <optimized out>
        pd = 0x7ffff66e1700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737327798016, -598798447421738166, 140737488347518, 140737488347519, 
                140737311019008, 3, 598784274574816074, 598783780828033866}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, 
            data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#15 0x00007ffff67cad0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
No locals.

Does this help at all?

magma1447 commented 3 years ago

And regarding valgrind. Since it doesn't crash while using valgrind, do you really need a log from that? I was assuming it doesn't say anything, since it's not crashing.

hasse69 commented 3 years ago

This info is vital for further trouble shooting

Opening /source-path/some-dirs/another-file-than-the-one-I-was-hashing.rar

You need to provide the exact name here, actually I need the entire log. The call to get_vname might be a key to what is happening. Since you wish to hide actual file names here, please send the log in private mail to

hans.beckerus AT gmail.com

This is a difficult problem to solve but let's hope we can find something in the debug data you have provided already. I am a bit baffled though since this is logic that has not been touched in a very long time and that it should suddenly start to fail is unfortunate to say the least. Did you happen to upgrade your system and that is when problems started?

hasse69 commented 3 years ago

Valgrind logs are important, it might still tell/warn about things being abnormal even if it does not crash.

magma1447 commented 3 years ago

Email with additional data has been sent.

hasse69 commented 3 years ago

After some analysis of the logs the conclusion is that due to some unfortunate naming of the volumes in combination with the fact that they are all placed in the same directory fools the current logic in rar2fs which in the end causes memory corruption. The conditions that have to be met to hit this problem is not considered very common.

A patch has been created for this issue and is currently undergoing regression.

hasse69 commented 3 years ago

Please try this patch and report back.

patch153.txt

magma1447 commented 3 years ago

Please try this patch and report back.

patch153.txt

I have upgraded from the former patch you gave me to this one. No issues this far. But I just installed it and have only run a test with the known error causing files.

hasse69 commented 3 years ago

Thanks. Keep it running for a few days and report back if you see any issues. I plan to merge this patch next week. I have done a rather extensive regression on it since it involves major redesign of some parts.

hasse69 commented 3 years ago

Issue should now be resolved. Closing.

hasse69 / rar2fs

Transport endpoint is not connected #153