trapexit / mergerfs

a featureful union filesystem
http://spawn.link
Other
4.23k stars 170 forks source link

mergerfs crashes when creating zip file #369

Closed dotreloaded closed 7 years ago

dotreloaded commented 7 years ago
mergerfs version: 2.19.0
FUSE library version: 2.9.7
fusermount version: 2.9.7
using FUSE kernel interface version 7.19

My mergerfs mount crashes when I create a zip file inside the mountpoint. It only happens sometimes. I can provide a sample of some files that when zipped always trigger a crash for me if you'd like.

If you just take a local directory and mount it to a empty directory: mergerfs -o defaults ~/srcmnt ~/mountpoint

Then if I create a zip file in ~/mountpoint with something like: zip -r zipfile.zip files/* mergerfs will crash. It also crashed when I merger mounted two external hard drives to ~/mountpoint. I've been using mergerfs before without this problem so this is a recent-ish issue.

The man page section on crashes only mentions earlier versions of libfuse as a problem so not sure what to do. Let me know if there's more info I can provide.

trapexit commented 7 years ago

What do you mean by "crashed when I merger mounted two external hard drives"?

What OS and kernel are you using?

trapexit commented 7 years ago

Also... what are your mount options? How many files are you zipping? Approximately how large are they?

I need as much information as possible to attempt to reproduce the issue.

trapexit commented 7 years ago

I just ran the below for a few hours. No crashes.

for x in $(seq 1 1000)
do
  dd if=/dev/urandom of=files/$x bs=1M count=1
done
while true;
do
  zip -r zipfile.zip files/*
  rm zipfile.zip
done
$ uname -a
Linux ion 4.4.0-62-generic #83~14.04.1-Ubuntu SMP Wed Jan 18 18:10:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
$ mergerfs -v
mergerfs version: 2.19.0
FUSE library version: 2.9.4
fusermount version: 2.9.4
using FUSE kernel interface version 7.19
dotreloaded commented 7 years ago

Here are a few images that seem to always cause a crash when zipped. https://my.mixtape.moe/uqocab.zip Unzip it, put it inside mountpoint, zip it, and mergerfs crashes.

$ cat /proc/version
Linux version 4.9.6-1-ARCH (builduser@tobias) (gcc version 6.3.1 20170109 (GCC) ) #1 SMP PREEMPT Thu Jan 26 09:22:26 CET 2017

I'm just using defaults as my options right now for mounting mergerfs. By "crashed when I merger mounted two external hard drives", I just mean that this crash also occurred when I mounted my external hard drives with something like mergerfs -o defaults /mnt/\* ~/mountpoint.

trapexit commented 7 years ago

So it's running as a regular user or root?

dotreloaded commented 7 years ago

Regular user

trapexit commented 7 years ago

I'm running zip -r file.zip someimages/ in a loop... nothing.

Do you have docker installed? Can you try installing mergerfs and doing this in an ubuntu:14.04 or 16.04 image? Or try in a VM? Or tell me what OS you're using so I can try?

dotreloaded commented 7 years ago

I'm running arch linux. I haven't used docker before but I'll try playing around with it.

mjmeehan commented 7 years ago

Are you getting a core from the crash? After it crashes does dmesg say why?

On 11 Feb 2017 1:30 pm, "dotreloaded" notifications@github.com wrote:

I'm running arch linux. I haven't used docker before but I'll try playing around with it.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/trapexit/mergerfs/issues/369#issuecomment-279165869, or mute the thread https://github.com/notifications/unsubscribe-auth/AA-S0Nu64cEZ4dIVQd0_HW6XikpJ5Xedks5rbf4-gaJpZM4L-F-V .

dotreloaded commented 7 years ago

dmesg doesn't say anything. Here is what zip outputs if it's at all helpful:

$ zip -r someimages.zip someimages/*
  adding: someimages/gogh-Old Vineyard with Peasant Woman.jpg (deflated 0%)
  adding: someimages/ibels-theatre libre.jpg (deflated 0%)
  adding: someimages/kunisada-Nihonbashi.jpg (deflated 0%)
  adding: someimages/ranson-tigre.jpg (deflated 0%)
  adding: someimages/roussel-deux petits centaures.jpg (deflated 0%)
  adding: someimages/vallotton-La paresse.jpg
zip warning: Transport endpoint is not connected
        zip warning: could not open for reading: someimages/vallotton-La paresse.jpg

zip warning: Not all files were readable
  files/entries read:  5 (18M bytes)  skipped:  1 (3.8M bytes)
zip I/O error: Transport endpoint is not connected
zip error: Temporary file failure (ziK1PblE)
*** Error in `zip': double free or corruption (!prev): 0x00000000019e74a0 ***
======= Backtrace: =========
/usr/lib/libc.so.6(+0x70c4b)[0x7f6646f25c4b]
/usr/lib/libc.so.6(+0x76fe6)[0x7f6646f2bfe6]
/usr/lib/libc.so.6(+0x777de)[0x7f6646f2c7de]
/usr/lib/libc.so.6(fclose+0x132)[0x7f6646f1bc92]
zip[0x409218]
zip[0x4083bb]
/usr/lib/libc.so.6(__libc_start_main+0xf1)[0x7f6646ed5291]
zip[0x408759]
======= Memory map: ========
00400000-00433000 r-xp 00000000 fe:01 955584                             /usr/bin/zip
00632000-00633000 r--p 00032000 fe:01 955584                             /usr/bin/zip
00633000-00635000 rw-p 00033000 fe:01 955584                             /usr/bin/zip
00635000-00684000 rw-p 00000000 00:00 0
019bf000-01a01000 rw-p 00000000 00:00 0                                  [heap]
7f6640000000-7f6640021000 rw-p 00000000 00:00 0
7f6640021000-7f6644000000 ---p 00000000 00:00 0
7f6646c40000-7f6646c56000 r-xp 00000000 fe:01 964924                     /usr/lib/libgcc_s.so.1
7f6646c56000-7f6646e55000 ---p 00016000 fe:01 964924                     /usr/lib/libgcc_s.so.1
7f6646e55000-7f6646e56000 r--p 00015000 fe:01 964924                     /usr/lib/libgcc_s.so.1
7f6646e56000-7f6646e57000 rw-p 00016000 fe:01 964924                     /usr/lib/libgcc_s.so.1
7f6646eb5000-7f664704a000 r-xp 00000000 fe:01 920747                     /usr/lib/libc-2.24.so
7f664704a000-7f6647249000 ---p 00195000 fe:01 920747                     /usr/lib/libc-2.24.so
7f6647249000-7f664724d000 r--p 00194000 fe:01 920747                     /usr/lib/libc-2.24.so
7f664724d000-7f664724f000 rw-p 00198000 fe:01 920747                     /usr/lib/libc-2.24.so
7f664724f000-7f6647253000 rw-p 00000000 00:00 0
7f6647253000-7f6647262000 r-xp 00000000 fe:01 923767                     /usr/lib/libbz2.so.1.0.6
7f6647262000-7f6647461000 ---p 0000f000 fe:01 923767                     /usr/lib/libbz2.so.1.0.6
7f6647461000-7f6647463000 rw-p 0000e000 fe:01 923767                     /usr/lib/libbz2.so.1.0.6
7f6647463000-7f6647486000 r-xp 00000000 fe:01 920746                     /usr/lib/ld-2.24.so
7f664748b000-7f6647623000 r--p 00000000 fe:01 993983                     /usr/lib/locale/locale-archive
7f6647623000-7f6647627000 rw-p 00000000 00:00 0
7f6647684000-7f6647685000 rw-p 00000000 00:00 0
7f6647685000-7f6647686000 r--p 00022000 fe:01 920746                     /usr/lib/ld-2.24.so
7f6647686000-7f6647687000 rw-p 00023000 fe:01 920746                     /usr/lib/ld-2.24.so
7f6647687000-7f6647688000 rw-p 00000000 00:00 0
7fffa63c9000-7fffa63ea000 rw-p 00000000 00:00 0                          [stack]
7fffa63ec000-7fffa63ee000 r--p 00000000 00:00 0                          [vvar]
7fffa63ee000-7fffa63f0000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
mjmeehan commented 7 years ago

Ok, that helps a bit. Could you follow these instructions and reproduce the issue with debug symbols on for glibc? Could also help to do the same for zip.

https://wiki.archlinux.org/index.php/Debug_-_Getting_Traces

On Sat, Feb 11, 2017 at 1:56 PM, dotreloaded notifications@github.com wrote:

dmesg doesn't say anything. Here is what zip outputs if it's at all helpful:

$ zip -r someimages.zip someimages/* adding: someimages/gogh-Old Vineyard with Peasant Woman.jpg (deflated 0%) adding: someimages/ibels-theatre libre.jpg (deflated 0%) adding: someimages/kunisada-Nihonbashi.jpg (deflated 0%) adding: someimages/ranson-tigre.jpg (deflated 0%) adding: someimages/roussel-deux petits centaures.jpg (deflated 0%) adding: someimages/vallotton-La paresse.jpg zip warning: Transport endpoint is not connected zip warning: could not open for reading: someimages/vallotton-La paresse.jpg

zip warning: Not all files were readable files/entries read: 5 (18M bytes) skipped: 1 (3.8M bytes) zip I/O error: Transport endpoint is not connected zip error: Temporary file failure (ziK1PblE) Error in `zip': double free or corruption (!prev): 0x00000000019e74a0 ======= Backtrace: ========= /usr/lib/libc.so.6(+0x70c4b)[0x7f6646f25c4b] /usr/lib/libc.so.6(+0x76fe6)[0x7f6646f2bfe6] /usr/lib/libc.so.6(+0x777de)[0x7f6646f2c7de] /usr/lib/libc.so.6(fclose+0x132)[0x7f6646f1bc92] zip[0x409218] zip[0x4083bb] /usr/lib/libc.so.6(__libc_start_main+0xf1)[0x7f6646ed5291] zip[0x408759] ======= Memory map: ======== 00400000-00433000 r-xp 00000000 fe:01 955584 /usr/bin/zip 00632000-00633000 r--p 00032000 fe:01 955584 /usr/bin/zip 00633000-00635000 rw-p 00033000 fe:01 955584 /usr/bin/zip 00635000-00684000 rw-p 00000000 00:00 0 019bf000-01a01000 rw-p 00000000 00:00 0 [heap] 7f6640000000-7f6640021000 rw-p 00000000 00:00 0 7f6640021000-7f6644000000 ---p 00000000 00:00 0 7f6646c40000-7f6646c56000 r-xp 00000000 fe:01 964924 /usr/lib/libgcc_s.so.1 7f6646c56000-7f6646e55000 ---p 00016000 fe:01 964924 /usr/lib/libgcc_s.so.1 7f6646e55000-7f6646e56000 r--p 00015000 fe:01 964924 /usr/lib/libgcc_s.so.1 7f6646e56000-7f6646e57000 rw-p 00016000 fe:01 964924 /usr/lib/libgcc_s.so.1 7f6646eb5000-7f664704a000 r-xp 00000000 fe:01 920747 /usr/lib/libc-2.24.so 7f664704a000-7f6647249000 ---p 00195000 fe:01 920747 /usr/lib/libc-2.24.so 7f6647249000-7f664724d000 r--p 00194000 fe:01 920747 /usr/lib/libc-2.24.so 7f664724d000-7f664724f000 rw-p 00198000 fe:01 920747 /usr/lib/libc-2.24.so 7f664724f000-7f6647253000 rw-p 00000000 00:00 0 7f6647253000-7f6647262000 r-xp 00000000 fe:01 923767 /usr/lib/libbz2.so.1.0.6 7f6647262000-7f6647461000 ---p 0000f000 fe:01 923767 /usr/lib/libbz2.so.1.0.6 7f6647461000-7f6647463000 rw-p 0000e000 fe:01 923767 /usr/lib/libbz2.so.1.0.6 7f6647463000-7f6647486000 r-xp 00000000 fe:01 920746 /usr/lib/ld-2.24.so 7f664748b000-7f6647623000 r--p 00000000 fe:01 993983 /usr/lib/locale/locale-archive 7f6647623000-7f6647627000 rw-p 00000000 00:00 0 7f6647684000-7f6647685000 rw-p 00000000 00:00 0 7f6647685000-7f6647686000 r--p 00022000 fe:01 920746 /usr/lib/ld-2.24.so 7f6647686000-7f6647687000 rw-p 00023000 fe:01 920746 /usr/lib/ld-2.24.so 7f6647687000-7f6647688000 rw-p 00000000 00:00 0 7fffa63c9000-7fffa63ea000 rw-p 00000000 00:00 0 [stack] 7fffa63ec000-7fffa63ee000 r--p 00000000 00:00 0 [vvar] 7fffa63ee000-7fffa63f0000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/trapexit/mergerfs/issues/369#issuecomment-279167604, or mute the thread https://github.com/notifications/unsubscribe-auth/AA-S0Iefmvl-jwn_1LrAqehsh4plQRe2ks5rbgRngaJpZM4L-F-V .

trapexit commented 7 years ago

It's possible that the zip error is due to the filesystem error but it could be related to mergerfs dying.

I've seen users who had random crashes and it turned out to be their RAM or CPUs being bad. Might want to run memtest86 or similarly overload the system without mergerfs and see if it crashes.

dotreloaded commented 7 years ago

This is the trace gdb gave me after compiling glibc and zip with the debug option.

Thread 1 (process 10840):
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
        set = {__val = {0, 8030813354868236905, 3966041993857872174, 7017844308454041190, 7378697426660503600, 3472328520475359078, 3472310980127322656, 7358934774648418352, 3472901073083054693, 2314885530819768882, 
            2314885530818453536, 2314885530818453536, 7091318039310988591, 3761119431852583983, 7378697426077446958, 3472328524770326374}}
        pid = <optimized out>
        tid = <optimized out>
#1  0x00007ffff78603ca in __GI_abort () at abort.c:89
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0x3061666637666666, sa_sigaction = 0x3061666637666666}, sa_mask = {__val = {7378697628691542064, 2319406792480024119, 3472328228586663282, 3472310704041635888, 2314885599538982970, 
              2314885530818453536, 2314885530818453536, 6566283579056201760, 7365367112005805174, 3486743125717050982, 7378697628691542064, 2319406792496801335, 3472328228581748082, 7306562836794192434, 3616445721713127482, 
              140737488345712}}, sa_flags = 87, sa_restorer = 0x7fffffffda70}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x00007ffff789cb90 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7ffff7991598 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175
        ap = {{gp_offset = 40, fp_offset = 0, overflow_arg_area = 0x7fffffffda80, reg_save_area = 0x7fffffffda10}}
        fd = 3
        on_2 = <optimized out>
        list = <optimized out>
        nlist = <optimized out>
        cp = <optimized out>
        written = <optimized out>
#3  0x00007ffff78a2f16 in malloc_printerr (action=3, str=0x7ffff7991610 "double free or corruption (!prev)", ptr=<optimized out>, ar_ptr=<optimized out>) at malloc.c:5046
        buf = "00000000006ac7d0"
        cp = <optimized out>
        ar_ptr = <optimized out>
        ptr = <optimized out>
        str = 0x7ffff7991610 "double free or corruption (!prev)"
        action = 3
#4  0x00007ffff78a370e in _int_free (av=0x7ffff7bc4ae0 <main_arena>, p=0x6ac7c0, have_lock=0) at malloc.c:3902
        size = <optimized out>
        fb = <optimized out>
        nextchunk = <optimized out>
        nextsize = <optimized out>
        nextinuse = <optimized out>
        prevsize = <optimized out>
        bck = <optimized out>
        fwd = <optimized out>
        errstr = <optimized out>
        locked = <optimized out>
        __func__ = "_int_free"
#5  0x00007ffff7892bd2 in _IO_new_fclose (fp=0x6ac7d0) at iofclose.c:84
        status = -1
#6  0x00000000004091b8 in ziperr (c=10, h=0x685670 "/home/dot/mountpoint/zis5wOkM") at zip.c:373
        error_level = 1
#7  0x0000000000407ef4 in main (argc=<optimized out>, argv=<optimized out>) at zip.c:5899
        d = <optimized out>
        e = <optimized out>
        f = <optimized out>
        i = <optimized out>
        kk = <optimized out>
        c = 19687931
        t = <optimized out>
        k = <optimized out>
        n = <optimized out>
        o = <optimized out>
        p = <optimized out>
        pp = <optimized out>
        r = <optimized out>
        s = <optimized out>
        csize = <optimized out>
        usize = 3995917
        tf = <optimized out>
        first_listarg = <optimized out>
        v = <optimized out>
        w = <optimized out>
        x = 0x0
        z = <optimized out>
        bad_open_is_error = <optimized out>
        zipbuf = <optimized out>
        comment_stream = 0x7ffff7bc48a0 <_IO_2_1_stdin_>
        all_current = <optimized out>
        filearg = <optimized out>
        option = <optimized out>
        argcnt = 8
        argnum = 8
        optchar = -5
        value = 0x0
        negated = 0
        fna = 8
        optnum = -1
        show_options = <optimized out>
        show_what_doing = 0
        show_args = <optimized out>
        seen_doubledash = <optimized out>
        key_needed = <optimized out>
        have_out = <optimized out>
        args = 0x684fe0
trapexit commented 7 years ago

Would you be able to try an older kernel? 4.8 or older?

dotreloaded commented 7 years ago

Problem seems to have started with 4.9. Doesn't occur in 4.8.14-1 and earlier, occurs in 4.9.0-1 and later.

trapexit commented 7 years ago

Ok. Thanks. I'll send an email to the FUSE developers and see if they can figure something out.

dotreloaded commented 7 years ago

I upgraded to 4.10.1-1 today and I don't have the problem anymore so I'll close this.