hasse69 / rar2fs

FUSE file system for reading RAR archives
https://hasse69.github.io/rar2fs/
GNU General Public License v3.0
279 stars 27 forks source link

rar2fs crash with corrupted double linked list error #98

Closed Darkvater closed 6 years ago

Darkvater commented 6 years ago

Continuing from https://github.com/hasse69/rar2fs/issues/95

tomi@hattusa:~$ sudo rar2fs /mnt/sma/media/rar-movies /mnt/sma/media/.rar-movies.rar2fs -o rw,noatime,allow_other --seek-length=2 -o max_readahead=512K -d 2> smamovie
*** Error in `rar2fs': corrupted double-linked list: 0x0000007f7000a2c0 ***        
Aborted            

Actually managed to get two core dumps

1

Darkvater commented 6 years ago

Update 1: My mount combines multiple shares into a single share using unionfs. Eg movies and rar-movies into /movies which I can easily serve up as a single folder without having to worry about the internals.

I am unable to reproduce the problem with just mounting the rar2fs point, but I can with mounting 1 rar2fs point and doing a unionfs.

*** Error in `rar2fs': corrupted size vs. prev_size: 0x0000007f78001ef0 ***       
Aborted
*** Error in `rar2fs': double free or corruption (out): 0x0000007f940033d0 ***    
Aborted         
Darkvater commented 6 years ago

Update 2: Ran it from gdb. But just as useless I think?

(gdb) run /mnt/sda/media/rar-tv-series/ /mnt/sda/media/.rar-tv-series.rar2fs/ -o rw,noatime,allow_other --seek-length=2 -o max_readahead=512K -d 2>sdatv2
Starting program: /usr/bin/rar2fs /mnt/sda/media/rar-tv-series/ /mnt/sda/media/.rar-tv-series.rar2fs/ -o rw,noatime,allow_other --seek-length=2 -o max_readahead=512K -d 2>sdatv2                                                                       
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".        
[New Thread 0x7fb04bb1e0 (LWP 11945)]
[New Thread 0x7fafc991e0 (LWP 11946)]                                             
[New Thread 0x7faf4771e0 (LWP 11947)]
[New Thread 0x7faec551e0 (LWP 11990)]                                             
[New Thread 0x7fae4331e0 (LWP 11991)]
*** Error in `/usr/bin/rar2fs': double free or corruption (out): 0x0000007f9c007770 ***
Thread 5 "rar2fs" received signal SIGABRT, Aborted.
[Switching to Thread 0x7faec551e0 (LWP 11990)]                                    
0x0000007fb7bf59fc in raise () from /lib/aarch64-linux-gnu/libc.so.6
(gdb) bt                                                                          
#0  0x0000007fb7bf59fc in raise () from /lib/aarch64-linux-gnu/libc.so.6
#1  0x0000007fb7bf6df4 in abort () from /lib/aarch64-linux-gnu/libc.so.6           
#2  0x0000007fb7c2e628 in ?? () from /lib/aarch64-linux-gnu/libc.so.6
#3  0x0000007fb7ce0f08 in ?? () from /lib/aarch64-linux-gnu/libc.so.6              
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb)              
hasse69 commented 6 years ago

Thanks for the issue report.

I think we should separate this from #95 at least until we cannot prove it is having the same root cause. Currently I do not think that is the case due to the very different crash signature and stack traces.

I really appreciate all the hard work you are putting into trying to collect more data for trouble shooting. We will need more of that since I am currently unable to reproduce anything similar.

There are a few things that are interesting to note here:

These kind of issues are very hard to troubleshoot, especially since we do not seem to be able to collect a decent stack trace. We also cannot out rule the fact that the source of the corruption could be located some where else, not in rar2fs itself. The crash always points to the main process, but there are several linked in libraries being used too. What strikes me as a bit odd is that you started to see this after a system upgrade and without unionsfs it also seems to behave.

From what I can tell so far is that the last thing you do see is some partial read of a file, right? What if the buffer provided by FUSE in the read call simply is not big enough? I am not saying that is the case, only saying that a corruption can have many different sources.

So, we need to narrow it down somehow. Would it be possible for you to downgrade rar2fs to , lets say, v1.24.1? Not that I think it will do much good but we need to try something.

Another thing you can try on the current version is mounting using the -s flag to force rar2fs/FUSE into single threaded mode.

Darkvater commented 6 years ago

Update 2 (sorry for being late, but a new addition to the family totally changed all plans):

Darkvater commented 6 years ago

Update:

hasse69 commented 6 years ago

Yea, I was sort of afraid of this... So, if you can confirm it is working fine using -s, then the only conclusion we can make from that is that there is some terrible thread synchronization bug left in the code somewhere. If that bug is in fact in rar2fs or some library it is using is very hard to tell without a proper stack trace. Actually, even with a stack trace it can be difficult to find since memory corruption can go on silently for a while until it becomes a problem big enough to cause a crash.

hasse69 commented 6 years ago

What you can try is to compile rar2fs using the maximum debug level, I believe it is --enable-debug=5 and then start it in the foreground using -d. Possibly we can at least see what is the last thing it tries before going down.

Darkvater commented 6 years ago

Tried that, but it is not really giving more information compared to --enable-debug which is level 4.

Now the interesting but is that the exception happens in thread 11892. However, this is not rar2fs. All activities in the log happen in 11871. Is that perhaps a clue?

Thread 5 "rar2fs.1.27.0-d" received signal SIGABRT, Aborted. [Switching to Thread 0x7faecdd1e0 (LWP 11892)] 0x0000007fb7c5b9fc in raise () from /lib/aarch64-linux-gnu/libc.so.6 (gdb) info threads Id Target Id Frame 1 Thread 0x7fb7ff5420 (LWP 11871) "rar2fs.1.27.0-d" 0x0000007fb7cc7764 in nanosleep () from /lib/aarch64-linux-gnu/libc.so.6 2 Thread 0x7fb05211e0 (LWP 11873) "rar2fs.1.27.0-d" 0x0000007fb7d84870 in do_futex_wait.constprop () from /lib/aarch64-linux-gnu/libpthread.so.0 3 Thread 0x7fafcff1e0 (LWP 11874) "rar2fs.1.27.0-d" 0x0000007fb7d85a28 in read () from /lib/aarch64-linux-gnu/libpthread.so.0 4 Thread 0x7faf4dd1e0 (LWP 11875) "rar2fs.1.27.0-d" 0x0000007fb7d85a28 in read () from /lib/aarch64-linux-gnu/libpthread.so.0

hasse69 commented 6 years ago

The thread id is a hint but not much more than that I am afraid. Since the stack cannot be traversed properly it is impossible to say which thread this really is. But this is somewhat strange:

size = 14288, chunk = 14851203                                                    
Read 14288 bytes from vol=16, base=0                                                 
   read[548212346464] 131072 bytes from 237502464                                    
   unique: 1853, success, outsize: 131088

The size read is 14288 bytes but still FUSE thinks we read 131072 bytes? Can you look a few lines up in the log to see if you can find the actual read request for 14288 bytes? Should be something like this

read[548212346464] 131072 bytes from 237764608 flags: 0x20000

but with a size of 14288 rather than 131072.

We can try to add some protection in the raw read function in case we are called from different threads using asynchronous reads. The -osync_read is hard-coded by rar2fs so that would surprise me though.

Unfortunately we dump the PID and not the TID in the log. We might have spotted if the read is coming from the thread that caught the exception. Can you please step up to latest version of rar2fs? Then it would be easier for me to attach a patch that you could try.

Also, am I right in that it seem to be different files each time being involved in the crash? What if you reduce your media library size? Can you still reproduce?

Darkvater commented 6 years ago

Okay, I had a look around and I think we might have some bit more information still running 1.27.0. Added some more gcc hardening flags like -fsanitize=address* and some others, and when things crash I have a lot more information 👍. I have also further narrowed down the cause, and can reproduce just by mounting the share and copying a single file.

The below should help, right? :) sdatv.txt

==31414==ERROR: AddressSanitizer: heap-use-after-free on address 0x7fac102e63 at pc 0x55555b416c bp 0x7faa6fd810 sp 0x7faa6fd878
READ of size 26 at 0x7fac102e63 thread T2
    #0 0x55555b416b in __interceptor_strncmp.part.67 (/mnt/sda/work/tmp/rar2fs-1.27.0/rar2fs+0x5f16b)
    #1 0x555567e16b in syncdir_scan /mnt/sda/work/tmp/rar2fs-1.27.0/rar2fs.c:2745
    #2 0x555567f72b in syncdir /mnt/sda/work/tmp/rar2fs-1.27.0/rar2fs.c:2982
    #3 0x555567ffb7 in rar2_getattr /mnt/sda/work/tmp/rar2fs-1.27.0/rar2fs.c:3081
    #4 0x7fb7f89b6f  (/lib/aarch64-linux-gnu/libfuse.so.2+0xbb6f)

0x7fac102e63 is located 19 bytes inside of 48-byte region [0x7fac102e50,0x7fac102e80)
freed by thread T2 here:
    #0 0x5555621ce3 in free (/mnt/sda/work/tmp/rar2fs-1.27.0/rar2fs+0xccce3)
    #1 0x555567e61f in syncdir_scan /mnt/sda/work/tmp/rar2fs-1.27.0/rar2fs.c:2790
    #2 0x555567f72b in syncdir /mnt/sda/work/tmp/rar2fs-1.27.0/rar2fs.c:2982
    #3 0x555567ffb7 in rar2_getattr /mnt/sda/work/tmp/rar2fs-1.27.0/rar2fs.c:3081
    #4 0x7fb7f89b6f  (/lib/aarch64-linux-gnu/libfuse.so.2+0xbb6f)
    #5 0x7fb7f89d87  (/lib/aarch64-linux-gnu/libfuse.so.2+0xbd87)
    #6 0x7fb7f947e3  (/lib/aarch64-linux-gnu/libfuse.so.2+0x167e3)
    #7 0x7fb7f91607  (/lib/aarch64-linux-gnu/libfuse.so.2+0x13607)
    #8 0x7fb7d9e09f in start_thread (/lib/aarch64-linux-gnu/libpthread.so.0+0x709f)
    #9 0x7fb7c47eab  (/lib/aarch64-linux-gnu/libc.so.6+0xc7eab)

previously allocated by thread T2 here:
    #0 0x5555621f93 in __interceptor_malloc (/mnt/sda/work/tmp/rar2fs-1.27.0/rar2fs+0xccf93)
    #1 0x7fb7c1843f  (/lib/aarch64-linux-gnu/libc.so.6+0x9843f)
    #2 0x55555ebc27 in scandir64 (/mnt/sda/work/tmp/rar2fs-1.27.0/rar2fs+0x96c27)
    #3 0x555567dfef in syncdir_scan /mnt/sda/work/tmp/rar2fs-1.27.0/rar2fs.c:2734
    #4 0x555567f72b in syncdir /mnt/sda/work/tmp/rar2fs-1.27.0/rar2fs.c:2982
    #5 0x555567ffb7 in rar2_getattr /mnt/sda/work/tmp/rar2fs-1.27.0/rar2fs.c:3081
    #6 0x7fb7f89b6f  (/lib/aarch64-linux-gnu/libfuse.so.2+0xbb6f)
    #7 0x7fb7f89d87  (/lib/aarch64-linux-gnu/libfuse.so.2+0xbd87)
    #8 0x7fb7f947e3  (/lib/aarch64-linux-gnu/libfuse.so.2+0x167e3)
    #9 0x7fb7f91607  (/lib/aarch64-linux-gnu/libfuse.so.2+0x13607)
    #10 0x7fb7d9e09f in start_thread (/lib/aarch64-linux-gnu/libpthread.so.0+0x709f)
    #11 0x7fb7c47eab  (/lib/aarch64-linux-gnu/libc.so.6+0xc7eab)

Update1:

[*] for future reference, add -fsanitize=address to CFLAGS/CPPFLAGS and -fsanitize=address -static-libasan to LDFLAGS

hasse69 commented 6 years ago

Thanks, will look into it.

Darkvater commented 6 years ago

updated post

hasse69 commented 6 years ago

If I comment out the fclose() in lread_raw() on rar2fs.c:1231 then the problems are gone. Obviously there should be a free there somewhere so not sure about that.

Yepp, you spotted a problem that has been there all the time I believe. Nice catch. I will look more closely into the problem and what can be done to resolve it. There are a couple of different approaches one might take here. It is remarkable how bugs can pass unnoticed sometimes.

In single threaded mode, or without unionfs the incorrect free of Update1 and doule _fclose()_of Update3 still happens, but doesn't seem to affect operations.

That is also what I suspect, this is probably a very silent bug hiding under the radar. Otherwise I think someone (including myself) would have seen this a long time ago. Again, nice catch.

hasse69 commented 6 years ago

Please try this patch. I primarily wish to see if it resolves the problem(s) you have observed. This is not necessarily the final solution.

issue98.patch.txt

Darkvater commented 6 years ago

Thanks, that seems to have solved it. AddressSanitizer does no longer complain and the few test cases I have thrown at it all work beautifully. Cheers!

hasse69 commented 6 years ago

Excellent. Please give it some more run-time and then report back. It is very possible this also solves #95. The unionfs must have triggered a rather unusual use-case for which multiple threads are accessing the same file but otherwise shares the same open request. If this is a common access pattern I think it should have been discovered much earlier. Sorry for all the trouble. Not sure why I have not considered this. My bad.

hasse69 commented 6 years ago

Made a push to master. Please try it and report back. If everything seems to work well we can close this issue. I will probably make an official patch release later this week.

Darkvater commented 6 years ago

Been running for the past six days and no issues detected. Finally my sbc is stable again - used to lock up & reboot due to rar2fs.

hasse69 commented 6 years ago

Thanks, I will make a release ASAP. Did not manage to get the time last week. I also have some ideas how this patch can be slightly improved so if possible I will ping you for a short test if that is ok. But I will release as is for now since it seems to be working fine.