nabnux commented 4 years ago

General description

I run mergerfs on my debian 10 VM to aggregate a 12TB and a 14TB disk. On this box also runs a Deluge daemon with ~200 torrents as well as a Plex Media Server.

The problem is that the mergerfs process gets randomly killed, which seems to happen when there's a lot of reads on the filesystem.

Expected behavior

mergerfs process should not be killed when I trigger a storage recheck on my torrents, or when Plex scans new media.

Actual behavior

The mergerfs process gets killed by SIGABRT, resulting in the Transport endpoint not connected message when trying to access the filesystem. No relevant logs are available in dmesg or journalctl.

Precise steps to reproduce the behavior

The most consistent way I've found to trigger the process killing is to force a storage recheck on a big torrent (tens of GB) in Deluge.

System information

Please provide as much of the following information as possible:

[x] mergerfs version:

# mergerfs -V
mergerfs version: 2.29.0-9-g7e8635b
FUSE library version: 2.9.7-mergerfs_2.29.0
fusermount version: 2.9.7-mergerfs_2.29.0
using FUSE kernel interface version 7.31

This version has been built from the latest git commit to get the debug symbols, but I also encountered this issue with the packages from the stable debian repo (2.24.2-4) and the testing repo (2.28.1-1)

[x] mergerfs settings: /mnt/data/vd* /home/deluge fuse.mergerfs allow_other,use_ino,cache.files=auto-full,func.getattr=newest,dropcacheonclose=true,fsname=mergerfs 0 0
[x] Linux version: 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux
[x] Versions of any additional software being used: deluged 1.3.15-2 plexmediaserver 1.18.4.2171-ac2afe5f8

This is a virtual machine running under KVM, the hypervisor is also running the same Debian version and kernel.

[x] List of drives, filesystems, & sizes:

# df -h
df: /home/deluge: Transport endpoint is not connected
Filesystem                 Size  Used Avail Use% Mounted on
udev                       2.0G     0  2.0G   0% /dev
tmpfs                      395M   16M  380M   4% /run
/dev/mapper/vg_vm-lv_root  1.8G  747M  1.1G  42% /
/dev/mapper/vg_vm-lv_usr   2.7G  1.8G  926M  66% /usr
tmpfs                      2.0G   24K  2.0G   1% /dev/shm
tmpfs                      5.0M     0  5.0M   0% /run/lock
tmpfs                      2.0G     0  2.0G   0% /sys/fs/cgroup
/dev/mapper/vg_vm-lv_boot  922M   86M  811M  10% /boot
/dev/mapper/vg_vm-lv_tmp   922M  2.0M  894M   1% /tmp
/dev/mapper/vg_vm-lv_var   2.7G  420M  2.3G  16% /var
/dev/mapper/vg_vm-lv_plex   30G   18G   11G  63% /var/lib/plexmediaserver
/dev/vdb                    11T  6.0T  4.4T  58% /mnt/data/vdb
/dev/vdc                    13T  4.2T  7.9T  35% /mnt/data/vdc
tmpfs                      395M     0  395M   0% /run/user/1000

[x] deluged strace:

2455  23:14:32.022911 epoll_ctl(4, EPOLL_CTL_MOD, 13, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP|EPOLLET, {u32=3221233904, u64=139980500377840}}) = 0
2455  23:14:32.022945 epoll_wait(4, [{EPOLLIN|EPOLLOUT, {u32=3235666800, u64=139980514810736}}], 128, 0) = 1
2455  23:14:32.022996 epoll_wait(4, [], 128, 0) = 0
2455  23:14:32.023026 epoll_wait(4, [{EPOLLIN, {u32=3221233904, u64=139980500377840}}], 128, -1) = 1
2455  23:14:32.023263 recvmsg(13, {msg_name={sa_family=AF_INET, sin_port=htons(51705), sin_addr=inet_addr("86.6.126.4")}, msg_namelen=28->16, msg_iov=[{iov_base="\1\0L\260T\372\3267\343J\20L\0\3 \0\201\2250\305=b\373\363A*\204\34wKO<"..., iov_len=2048}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 1438
2455  23:14:32.023318 recvmsg(13, {msg_name={sa_family=AF_INET, sin_port=htons(51705), sin_addr=inet_addr("86.6.126.4")}, msg_namelen=28->16, msg_iov=[{iov_base="\1\0L\260T\372\326=\343J\20L\0\3 \0\201\2260\305\277\315V\320\247j\"k\277C\304Q"..., iov_len=2048}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 1438
2455  23:14:32.023358 recvmsg(13, {msg_name={sa_family=AF_INET, sin_port=htons(51705), sin_addr=inet_addr("86.6.126.4")}, msg_namelen=28->16, msg_iov=[{iov_base="\1\0L\260T\372\326C\343J\20L\0\3 \0\201\2270\305\206\321\225\n+\360g\267(^\353\335"..., iov_len=2048}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 1438
2455  23:14:32.023397 recvmsg(13, {msg_namelen=28}, 0) = -1 EAGAIN (Resource temporarily unavailable)
2455  23:14:32.023431 sendmsg(13, {msg_name={sa_family=AF_INET, sin_port=htons(51705), sin_addr=inet_addr("86.6.126.4")}, msg_namelen=16, msg_iov=[{iov_base="!\0L\261q\261=\201\34\266g\33\0\17\357b0\307\201\227", iov_len=20}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 20
2455  23:14:32.023550 epoll_ctl(4, EPOLL_CTL_MOD, 13, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP|EPOLLET, {u32=3221233904, u64=139980500377840}}) = 0
2455  23:14:32.023586 epoll_wait(4, [{EPOLLIN, {u32=3221233904, u64=139980500377840}}], 128, 0) = 1
2455  23:14:32.023627 recvmsg(13, {msg_name={sa_family=AF_INET, sin_port=htons(51705), sin_addr=inet_addr("86.6.126.4")}, msg_namelen=28->16, msg_iov=[{iov_base="\1\0L\260T\372\326H\343J\20L\0\3 \0\201\2300\305[\273D\3545\317\357kyZ\360\343"..., iov_len=2048}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 1438
2455  23:14:32.023665 recvmsg(13, {msg_name={sa_family=AF_INET, sin_port=htons(51705), sin_addr=inet_addr("86.6.126.4")}, msg_namelen=28->16, msg_iov=[{iov_base="\1\0L\260T\372\331\343\343J\17\337\0\3 \0\201\2310\305\307\247\215%M\v\r#\310a^S"..., iov_len=2048}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 1438
2455  23:14:32.023701 recvmsg(13, {msg_namelen=28}, 0) = -1 EAGAIN (Resource temporarily unavailable)
2455  23:14:32.023733 sendmsg(13, {msg_name={sa_family=AF_INET, sin_port=htons(51705), sin_addr=inet_addr("86.6.126.4")}, msg_namelen=16, msg_iov=[{iov_base="!\0L\261q\261>\257\34\266d\254\0\17\364\3540\307\201\231", iov_len=20}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 20
2455  23:14:32.023795 epoll_ctl(4, EPOLL_CTL_MOD, 13, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP|EPOLLET, {u32=3221233904, u64=139980500377840}}) = 0
2455  23:14:32.023826 epoll_wait(4, [], 128, 0) = 0
2455  23:14:32.023859 epoll_wait(4, [], 128, 0) = 0
2455  23:14:32.023886 epoll_wait(4, [{EPOLLIN, {u32=3221233904, u64=139980500377840}}], 128, -1) = 1
2455  23:14:32.024233 recvmsg(13, {msg_name={sa_family=AF_INET, sin_port=htons(51705), sin_addr=inet_addr("86.6.126.4")}, msg_namelen=28->16, msg_iov=[{iov_base="\1\0L\260T\372\331\355\343J\17\337\0\3 \0\201\2320\305bx\317<\252u\3\260\2b\7\314"..., iov_len=2048}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 1438
2455  23:14:32.024278 recvmsg(13, {msg_name={sa_family=AF_INET, sin_port=htons(51705), sin_addr=inet_addr("86.6.126.4")}, msg_namelen=28->16, msg_iov=[{iov_base="\1\0L\260T\372\331\365\343J\17\337\0\3 \0\201\2330\305\317\217%\262\225\r\201\35e\261\302L"..., iov_len=2048}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 1438
2455  23:14:32.024317 recvmsg(13, {msg_name={sa_family=AF_INET, sin_port=htons(51705), sin_addr=inet_addr("86.6.126.4")}, msg_namelen=28->16, msg_iov=[{iov_base="\1\0L\260T\372\341\207\343J\21=\0\3 \0\201\2340\3050\310\3058ux\351\351v\275\366+"..., iov_len=2048}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 1438
2455  23:14:32.024360 recvmsg(13, {msg_namelen=28}, 0) = -1 EAGAIN (Resource temporarily unavailable)
2455  23:14:32.024395 sendmsg(13, {msg_name={sa_family=AF_INET, sin_port=htons(51705), sin_addr=inet_addr("86.6.126.4")}, msg_namelen=16, msg_iov=[{iov_base="!\0L\261q\261AD\34\266_\231\0\17\357b0\307\201\234", iov_len=20}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 20
2455  23:14:32.024461 epoll_ctl(4, EPOLL_CTL_MOD, 13, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP|EPOLLET, {u32=3221233904, u64=139980500377840}}) = 0
2455  23:14:32.024503 epoll_wait(4, [{EPOLLIN|EPOLLOUT, {u32=3235666800, u64=139980514810736}}], 128, 0) = 1
2455  23:14:32.024585 epoll_wait(4, [], 128, 0) = 0
2455  23:14:32.024616 futex(0x55f7fa9869e4, FUTEX_WAKE_PRIVATE, 2147483647) = 3
2458  23:14:32.024642 <... futex resumed> ) = 0
2457  23:14:32.024650 <... futex resumed> ) = 0
2458  23:14:32.024658 futex(0x55f7fa9869e8, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
2457  23:14:32.024664 futex(0x55f7fa9869e8, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
2458  23:14:32.024671 <... futex resumed> ) = 0
2457  23:14:32.024677 <... futex resumed> ) = 0
2456  23:14:32.024683 <... futex resumed> ) = 0
2455  23:14:32.024689 epoll_wait(4,  <unfinished ...>
2458  23:14:32.024703 futex(0x55f7fa9869e0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
2457  23:14:32.024718 futex(0x55f7fa9869e0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
2456  23:14:32.024725 futex(0x55f7fa9869e8, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
2455  23:14:32.024731 <... epoll_wait resumed> [], 128, 0) = 0
2456  23:14:32.024737 <... futex resumed> ) = 0
2455  23:14:32.024744 epoll_wait(4,  <unfinished ...>
2456  23:14:32.024750 futex(0x55f7fa9869e0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
2455  23:14:32.025105 <... epoll_wait resumed> [{EPOLLIN, {u32=3221233904, u64=139980500377840}}], 128, -1) = 1
2455  23:14:32.025159 recvmsg(13, {msg_name={sa_family=AF_INET, sin_port=htons(51705), sin_addr=inet_addr("86.6.126.4")}, msg_namelen=28->16, msg_iov=[{iov_base="\1\0L\260T\372\341\221\343J\21=\0\3 \0\201\2350\305w\3\302G\266\330>\24\213\330C\314"..., iov_len=2048}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 1438
2455  23:14:32.025221 recvmsg(13, {msg_name={sa_family=AF_INET, sin_port=htons(51705), sin_addr=inet_addr("86.6.126.4")}, msg_namelen=28->16, msg_iov=[{iov_base="\1\0L\260T\372\341\227\343J\21=\0\3 \0\201\2360\305\262D\317G]\311l\206M\17O\215"..., iov_len=2048}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 1438
2455  23:14:32.025260 recvmsg(13, {msg_name={sa_family=AF_INET, sin_port=htons(51705), sin_addr=inet_addr("86.6.126.4")}, msg_namelen=28->16, msg_iov=[{iov_base="\1\0L\260T\372\341\240\343J\17`\0\3 \0\201\2370\305:\360NAZ\n\316\253t\16\207\253"..., iov_len=2048}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 1438
2455  23:14:32.025298 recvmsg(13, {msg_name={sa_family=AF_INET, sin_port=htons(51705), sin_addr=inet_addr("86.6.126.4")}, msg_namelen=28->16, msg_iov=[{iov_base="\1\0L\260T\372\341\246\343J\17`\0\3 \0\201\2400\305x\312\30+Q\352\300\200\241\203\375\336"..., iov_len=2048}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 1438
2455  23:14:32.025339 recvmsg(13, {msg_namelen=28}, 0) = -1 EAGAIN (Resource temporarily unavailable)
2455  23:14:32.025385 sendmsg(13, {msg_name={sa_family=AF_INET, sin_port=htons(51705), sin_addr=inet_addr("86.6.126.4")}, msg_namelen=16, msg_iov=[{iov_base="!\0L\261q\261E \34\266cL\0\17\351\3300\307\201\240", iov_len=20}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 20
2455  23:14:32.025485 epoll_ctl(4, EPOLL_CTL_MOD, 13, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP|EPOLLET, {u32=3221233904, u64=139980500377840}}) = 0
2455  23:14:32.025519 epoll_wait(4, [], 128, 0) = 0
2455  23:14:32.025560 epoll_wait(4, [{EPOLLIN, {u32=3221233904, u64=139980500377840}}], 128, 0) = 1
2455  23:14:32.025593 recvmsg(13, {msg_name={sa_family=AF_INET, sin_port=htons(51705), sin_addr=inet_addr("86.6.126.4")}, msg_namelen=28->16, msg_iov=[{iov_base="\1\0L\260T\372\341\253\343J\17`\0\3 \0\201\2410\305\177\256\344\336\323\252\266\2@\273\4\v"..., iov_len=2048}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 1438
2455  23:14:32.025633 recvmsg(13, {msg_name={sa_family=AF_INET, sin_port=htons(51705), sin_addr=inet_addr("86.6.126.4")}, msg_namelen=28->16, msg_iov=[{iov_base="\1\0L\260T\372\341\261\343J\17`\0\3 \0\201\2420\305\340\326G\340.5\177\"}k;\376"..., iov_len=2048}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 1438
2455  23:14:32.025677 recvmsg(13, {msg_name={sa_family=AF_INET, sin_port=htons(51705), sin_addr=inet_addr("86.6.126.4")}, msg_namelen=28->16, msg_iov=[{iov_base="\1\0L\260T\372\341\266\343J\17`\0\3 \0\201\2430\305X\330[v\314FG\235\17\343\27A"..., iov_len=2048}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 1438
2455  23:14:32.025765 recvmsg(13, {msg_name={sa_family=AF_INET, sin_port=htons(51705), sin_addr=inet_addr("86.6.126.4")}, msg_namelen=28->16, msg_iov=[{iov_base="\1\0L\260T\372\3519\343J\22C\0\3 \0\201\2440\305xuC\325\5\3\17=\307\377'\252"..., iov_len=2048}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 1438
2455  23:14:32.025824 recvmsg(13, {msg_namelen=28}, 0) = -1 EAGAIN (Resource temporarily unavailable)
2455  23:14:32.025870 sendmsg(13, {msg_name={sa_family=AF_INET, sin_port=htons(51705), sin_addr=inet_addr("86.6.126.4")}, msg_namelen=16, msg_iov=[{iov_base="!\0L\261q\261G\5\34\266]\237\0\17\351\3300\307\201\244", iov_len=20}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 20
2455  23:14:32.025965 epoll_ctl(4, EPOLL_CTL_MOD, 13, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP|EPOLLET, {u32=3221233904, u64=139980500377840}}) = 0
2455  23:14:32.026006 epoll_wait(4, [], 128, 0) = 0
2455  23:14:32.026045 epoll_wait(4, [], 128, 0) = 0
2459  23:14:32.026082 <... preadv resumed> [{iov_base="k\351\6\34\327I\324D\230\304\315=\350J\353\300P\276\207\f\346\354\236A \23\255o\216+\212y"..., iov_len=16384}, {iov_base="5\251\220\363\324\357V\264\300\221\323\362\10\351\10\17\20\350m\364\375\272\351,\33\f\242:}\350}M"..., iov_len=16384}, {iov_base="\300\34\244\201\26\2248\222k\255\341H\264\277\237\324*\334u\232_\23\213?t\211A\303\317^oi"..., iov_len=16384}, {iov_base="\200f!\234\306\215\222\334\330\354\366\274\251hf\371\221Wu7^\256F\24\232?\37\271ak\24\t"..., iov_len=16384}, {iov_base="Q\337\2Pb\310\250\222\35737\341\201\354O\302\301z,d\32\3\340\306\206P,\205\206\vo\6"..., iov_len=16384}, {iov_base="\3*\33(\345\220}\377I\1N\207O\345'\346\220yX<6\34\334f\353\373\25;7\315\246+"..., iov_len=16384}, {iov_base="\0\263lp\233{\3L\341|\330oH\365\336\367V#kI\345\330\204\35m\302\246\205\377j\343\303"..., iov_len=16384}, {iov_base="laJ\356^\201\357\361\334\32\f.\0077\364\362\324\321\276\fd\312\233\vQ\200\237\2\177\327At"..., iov_len=16384}, {iov_base="\337MY\331\vjgA.\nVss\203Z\261\335~Q\347\227-\2*\304\261\333\316?]\231B"..., iov_len=16384}, {iov_base="RE\0\340\366\240q\4C\347#\2\275\332\0063\316\1\373\263P\374\233\r\16j\306\376\25\2G\203"..., iov_len=16384}, {iov_base="\347\322l\347\244\365'Xx\320\242H-\247#\327\230\214\32\202#\\0\254\17\371\210k\16\230'C"..., iov_len=16384}, {iov_base="&&\211\336V\247I\325\261*\27\313\272\277\203U\276V~\320'\276\0011\24{\210\302K\211\251\327"..., iov_len=16384}, {iov_base="O\370\326\0370\346v\310I\212\253}}\225\317\365\10\351\374\307\360\32\326\265\22<\3\205S\3720x"..., iov_len=16384}, {iov_base="(\326\330\225\5]\250\240\223\264/\23A\25321\306e:\324\243\240\376X\305i\303b\356.\367\327"..., iov_len=16384}, {iov_base="\n\2\240\330\267\271\354\362u@\377=\326|\374\2\374\300Y\341\374}c\300\373\3C\300\241\303\305\""..., iov_len=16384}, {iov_base="\326G\10\251K\255.5\316\355\255\377Jc9\212\360\332\244|5/Jd\323 \362\331\337\32\253c"..., iov_len=16384}, {iov_base="\3525\2425\354J\n\24{_\36181\306\21\213\336\256:\372S91y\266\367\310\206\246\350\230\377"..., iov_len=16384}, {iov_base="}\372.}\372\1\37\2\250k\317\345\266\362\n\211*\333\376\267\230\365\345O\353\262\305/C\2\327\270"..., iov_len=16384}, {iov_base="\317K\21\305\367\372\322\0232\"8\231\26\370\255\352v+\30\272\263\231\377+\326\0B\326G\310\22@"..., iov_len=16384}, {iov_base="M\374\316\2514{\16\374c\364,w\256\346``*Ft)szGI/\263\306b\t\214\204\301"..., iov_len=16384}, {iov_base="\343\312]\24\305\235\376\312\261\233\352\337(a9\311\347\16\277\301k\245\353\3030\26\210\362\356\252+\277"..., iov_len=16384}, {iov_base="\221e;\204\26A\377\1\315\217]\311\212\264\203FH\341(\267\344\376.Bmr\4u\330\330\376a"..., iov_len=16384}, {iov_base="\247\354\331\357\304C\346l^\233\306}B_D\255\277d\177\312\306\2\321\269A\256<f\244d\t"..., iov_len=16384}, {iov_base="\361K\373/\261B\316X(x\nQ\222\"\353s$\7\32 0\314\377\261\340\226^I\21]\2\5"..., iov_len=16384}, {iov_base="\6\300\304&2\364\0\217\257}\367l*\313~P\272\7\230Gi}\257\224:\217\230\242\225\6\3\315"..., iov_len=16384}, {iov_base="7\234\253aK\3607\274,\346\200D6\265\303\325\211\374\365\207P6\222\rE\5r\376\5%\241\246"..., iov_len=16384}, {iov_base="{\214\323\276\362\201.\336i\3752\244\373\346}\32\34_l`T\343\346g\33\206\246v63\270\217"..., iov_len=16384}, {iov_base="\377R\16\344\237e\232{\336`\330C\335\323\326\275\346RN\223g\240+=N\326~9N|\243\347"..., iov_len=16384}, {iov_base="\252bS\335g\342\20\270\326\255I\223\244\314\322\312(\363g\304\343\355Ix#\216\21\2442\0\363\\"..., iov_len=16384}, {iov_base="#'\343\346\22\34\226\217\335\n\233\357t\232\323\252Og\257\31(\357,\274\333p\265\250pD^Z"..., iov_len=16384}, {iov_base="\310B\266*\177\24)!6w\23=r\374LQ\265 \252\272\\\2347\335Q\272\237\227\221_\350\244"..., iov_len=16384}, {iov_base="\201\24\5\244\254\320\"\352+\273\220\256\26>\277Q0Vt\333vc\10\260\223\34m\255\225\20\321|"..., iov_len=16384}, ...], 256, 1925185536) = 3670016
2455  23:14:32.026160 epoll_wait(4,  <unfinished ...>
2459  23:14:32.026179 preadv(28, [{iov_base=0x7f4fa097d000, iov_len=0}, {iov_base=0x7f4fa097e000, iov_len=16384}, {iov_base=0x7f4fa0983000, iov_len=16384}, {iov_base=0x7f4fa0988000, iov_len=16384}, {iov_base=0x7f4fa098d000, iov_len=16384}, {iov_base=0x7f4fa0992000, iov_len=16384}, {iov_base=0x7f4fa0997000, iov_len=16384}, {iov_base=0x7f4fa099c000, iov_len=16384}, {iov_base=0x7f4fa09a1000, iov_len=16384}, {iov_base=0x7f4fa09a6000, iov_len=16384}, {iov_base=0x7f4fa09ab000, iov_len=16384}, {iov_base=0x7f4fa09b0000, iov_len=16384}, {iov_base=0x7f4fa09b5000, iov_len=16384}, {iov_base=0x7f4fa09ba000, iov_len=16384}, {iov_base=0x7f4fa09bf000, iov_len=16384}, {iov_base=0x7f4fa09c4000, iov_len=16384}, {iov_base=0x7f4fa09c9000, iov_len=16384}, {iov_base=0x7f4fa09ce000, iov_len=16384}, {iov_base=0x7f4fa09d3000, iov_len=16384}, {iov_base=0x7f4fa09d8000, iov_len=16384}, {iov_base=0x7f4fa09dd000, iov_len=16384}, {iov_base=0x7f4fa09e2000, iov_len=16384}, {iov_base=0x7f4fa09e7000, iov_len=16384}, {iov_base=0x7f4fa09ec000, iov_len=16384}, {iov_base=0x7f4fa09f1000, iov_len=16384}, {iov_base=0x7f4fa09f6000, iov_len=16384}, {iov_base=0x7f4fa09fb000, iov_len=16384}, {iov_base=0x7f4fa0a00000, iov_len=16384}, {iov_base=0x7f4fa0cee000, iov_len=16384}, {iov_base=0x7f4fa0cf3000, iov_len=16384}, {iov_base=0x7f4fa0cf8000, iov_len=16384}, {iov_base=0x7f4fa0cfd000, iov_len=16384}, ...], 33, 1928855552) = -1 ENOTCONN (Transport endpoint is not connected)
2459  23:14:32.026314 epoll_ctl(4, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLERR|EPOLLET, {u32=4203678440, u64=94523548961512}} <unfinished ...>
2455  23:14:32.026331 <... epoll_wait resumed> [{EPOLLIN|EPOLLOUT, {u32=3235666800, u64=139980514810736}}], 128, -1) = 1
2459  23:14:32.026344 <... epoll_ctl resumed> ) = 0
2455  23:14:32.026352 futex(0x55f7fa7bed68, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
2459  23:14:32.026362 futex(0x55f7fa7bed68, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
2455  23:14:32.026370 <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
2459  23:14:32.026380 <... futex resumed> ) = 0
2455  23:14:32.026387 futex(0x55f7fa7bed68, FUTEX_WAKE_PRIVATE, 1) = 0
2455  23:14:32.026451 epoll_wait(4,  <unfinished ...>
2459  23:14:32.026462 preadv(28,  <unfinished ...>
2455  23:14:32.026470 <... epoll_wait resumed> [{EPOLLIN, {u32=4203678440, u64=94523548961512}}], 128, 0) = 1
2459  23:14:32.026490 <... preadv resumed> [{iov_base=0x7f4fa0102000, iov_len=16384}, {iov_base=0x7f4fa0157000, iov_len=16384}, {iov_base=0x7f4fa01ac000, iov_len=16384}, {iov_base=0x7f4fa0201000, iov_len=16384}, {iov_base=0x7f4fa0256000, iov_len=16384}, {iov_base=0x7f4fa0260000, iov_len=16384}, {iov_base=0x7f4fa029c000, iov_len=16384}, {iov_base=0x7f4fa0341000, iov_len=16384}, {iov_base=0x7f4fa0427000, iov_len=16384}, {iov_base=0x7f4fa0477000, iov_len=16384}, {iov_base=0x7f4fa04bd000, iov_len=16384}, {iov_base=0x7f4fa050d000, iov_len=16384}, {iov_base=0x7f4fa0c80000, iov_len=16384}, {iov_base=0x7f4fa0e0e000, iov_len=16384}, {iov_base=0x7f4fa0e6d000, iov_len=16384}, {iov_base=0x7f4fa0f6c000, iov_len=16384}, {iov_base=0x7f4fa0003000, iov_len=16384}, {iov_base=0x7f4fa00ad000, iov_len=16384}, {iov_base=0x7f4fa0058000, iov_len=16384}, {iov_base=0x7f4fa0292000, iov_len=16384}, {iov_base=0x7f4fa0297000, iov_len=16384}, {iov_base=0x7f4fa0f49000, iov_len=16384}, {iov_base=0x7f4fa0fbc000, iov_len=16384}, {iov_base=0x7f4fa13f6000, iov_len=16384}, {iov_base=0x7f4fa13fb000, iov_len=16384}, {iov_base=0x7f4fa0db8000, iov_len=16384}, {iov_base=0x7f4fa0dbd000, iov_len=16384}, {iov_base=0x7f4fa0fad000, iov_len=16384}, {iov_base=0x7f4fa0fb2000, iov_len=16384}, {iov_base=0x7f4fa03c3000, iov_len=16384}, {iov_base=0x7f4fa03c8000, iov_len=16384}, {iov_base=0x7f4fa054e000, iov_len=16384}, ...], 256, 1929379840) = -1 ENOTCONN (Transport endpoint is not connected)
2455  23:14:32.026543 epoll_wait(4,  <unfinished ...>
2459  23:14:32.026576 epoll_ctl(4, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLERR|EPOLLET, {u32=4203678440, u64=94523548961512}}) = 0
2455  23:14:32.026797 <... epoll_wait resumed> [{EPOLLIN, {u32=4203678440, u64=94523548961512}}], 128, -1) = 1
2455  23:14:32.026840 epoll_wait(4,  <unfinished ...>
2459  23:14:32.026876 preadv(28, [{iov_base=0x7f4fa01ac000, iov_len=16384}, {iov_base=0x7f4fa0201000, iov_len=16384}, {iov_base=0x7f4fa0256000, iov_len=16384}, {iov_base=0x7f4fa0260000, iov_len=16384}, {iov_base=0x7f4fa029c000, iov_len=16384}, {iov_base=0x7f4fa0341000, iov_len=16384}, {iov_base=0x7f4fa0427000, iov_len=16384}, {iov_base=0x7f4fa0477000, iov_len=16384}, {iov_base=0x7f4fa04bd000, iov_len=16384}, {iov_base=0x7f4fa050d000, iov_len=16384}, {iov_base=0x7f4fa0c80000, iov_len=16384}, {iov_base=0x7f4fa0e0e000, iov_len=16384}, {iov_base=0x7f4fa0e6d000, iov_len=16384}, {iov_base=0x7f4fa0f6c000, iov_len=16384}, {iov_base=0x7f4fa0003000, iov_len=16384}, {iov_base=0x7f4fa00ad000, iov_len=16384}, {iov_base=0x7f4fa0058000, iov_len=16384}, {iov_base=0x7f4fa0157000, iov_len=16384}, {iov_base=0x7f4fa0102000, iov_len=16384}, {iov_base=0x7f4fa0292000, iov_len=16384}, {iov_base=0x7f4fa0297000, iov_len=16384}, {iov_base=0x7f4fa0fbc000, iov_len=16384}, {iov_base=0x7f4fa0f49000, iov_len=16384}, {iov_base=0x7f4fa13f6000, iov_len=16384}, {iov_base=0x7f4fa13fb000, iov_len=16384}, {iov_base=0x7f4fa0fad000, iov_len=16384}, {iov_base=0x7f4fa0fb2000, iov_len=16384}, {iov_base=0x7f4fa03c3000, iov_len=16384}, {iov_base=0x7f4fa03c8000, iov_len=16384}, {iov_base=0x7f4fa054e000, iov_len=16384}, {iov_base=0x7f4fa0553000, iov_len=16384}, {iov_base=0x7f4fa0db8000, iov_len=16384}, ...], 256, 1933574144) = -1 ENOTCONN (Transport endpoint is not connected)
2459  23:14:32.026971 epoll_ctl(4, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLERR|EPOLLET, {u32=4203678440, u64=94523548961512}}) = 0
2455  23:14:32.027002 <... epoll_wait resumed> [{EPOLLIN, {u32=4203678440, u64=94523548961512}}], 128, -1) = 1
2455  23:14:32.027033 epoll_wait(4,  <unfinished ...>
2459  23:14:32.027061 preadv(28, [{iov_base=0x7f4fa0256000, iov_len=16384}, {iov_base=0x7f4fa0260000, iov_len=16384}, {iov_base=0x7f4fa029c000, iov_len=16384}, {iov_base=0x7f4fa0341000, iov_len=16384}, {iov_base=0x7f4fa0427000, iov_len=16384}, {iov_base=0x7f4fa0477000, iov_len=16384}, {iov_base=0x7f4fa04bd000, iov_len=16384}, {iov_base=0x7f4fa050d000, iov_len=16384}, {iov_base=0x7f4fa0c80000, iov_len=16384}, {iov_base=0x7f4fa0e0e000, iov_len=16384}, {iov_base=0x7f4fa0e6d000, iov_len=16384}, {iov_base=0x7f4fa0f6c000, iov_len=16384}, {iov_base=0x7f4fa0003000, iov_len=16384}, {iov_base=0x7f4fa00ad000, iov_len=16384}, {iov_base=0x7f4fa0058000, iov_len=16384}, {iov_base=0x7f4fa0157000, iov_len=16384}, {iov_base=0x7f4fa0102000, iov_len=16384}, {iov_base=0x7f4fa0201000, iov_len=16384}, {iov_base=0x7f4fa01ac000, iov_len=16384}, {iov_base=0x7f4fa0292000, iov_len=16384}, {iov_base=0x7f4fa0297000, iov_len=16384}, {iov_base=0x7f4fa0f49000, iov_len=16384}, {iov_base=0x7f4fa0fbc000, iov_len=16384}, {iov_base=0x7f4fa13f6000, iov_len=16384}, {iov_base=0x7f4fa13fb000, iov_len=16384}, {iov_base=0x7f4fa03c3000, iov_len=16384}, {iov_base=0x7f4fa03c8000, iov_len=16384}, {iov_base=0x7f4fa054e000, iov_len=16384}, {iov_base=0x7f4fa0553000, iov_len=16384}, {iov_base=0x7f4fa0db8000, iov_len=16384}, {iov_base=0x7f4fa0dbd000, iov_len=16384}, {iov_base=0x7f4fa0fad000, iov_len=16384}, ...], 256, 1937768448) = -1 ENOTCONN (Transport endpoint is not connected)

[x] mergerfs strace at the same time:

13957 23:14:32.022651 <... pread64 resumed> "\4>u\2\10c\3268\30p|p,\352\31\31Q@\375\227\252Fb\206R\f#\35\242\325\r\233"..., 131072, 1928200192) = 131072
13957 23:14:32.022689 writev(3, [{iov_base="\20\0\2\0\0\0\0\0\37\221\1\0\0\0\0\0", iov_len=16}, {iov_base="\4>u\2\10c\3268\30p|p,\352\31\31Q@\375\227\252Fb\206R\f#\35\242\325\r\233"..., iov_len=131072}], 2) = 131088
13957 23:14:32.022761 munmap(0x7f773802a000, 139264 <unfinished ...>
13958 23:14:32.022795 <... read resumed> "P\0\0\0\17\0\0\0!\221\1\0\0\0\0\08\0\0\0\0\0\0\0m\0\0\0s\0\0\0"..., 1052672) = 80
13957 23:14:32.022810 <... munmap resumed> ) = 0
13958 23:14:32.022826 mmap(NULL, 139264, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
13957 23:14:32.022841 read(3,  <unfinished ...>
13958 23:14:32.022856 <... mmap resumed> ) = 0x7f773802a000
13958 23:14:32.022928 pread64(15,  <unfinished ...>
13956 23:14:32.023235 <... pread64 resumed> "\30y\221\24J\34\315\206a\347u\226F\351\366Z\356\223\316\267\265i\2\242)\2\320\237\r\275\301\310"..., 131072, 1928331264) = 131072
13956 23:14:32.023264 writev(3, [{iov_base="\20\0\2\0\0\0\0\0 \221\1\0\0\0\0\0", iov_len=16}, {iov_base="\30y\221\24J\34\315\206a\347u\226F\351\366Z\356\223\316\267\265i\2\242)\2\320\237\r\275\301\310"..., iov_len=131072}], 2) = 131088
13956 23:14:32.023335 munmap(0x7f7738008000, 139264) = 0
13955 23:14:32.023374 <... read resumed> "P\0\0\0\17\0\0\0\"\221\1\0\0\0\0\08\0\0\0\0\0\0\0m\0\0\0s\0\0\0"..., 1052672) = 80
13956 23:14:32.023390 read(3,  <unfinished ...>
13955 23:14:32.023397 mmap(NULL, 139264, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7738008000
13955 23:14:32.023418 pread64(15,  <unfinished ...>
13958 23:14:32.024417 <... pread64 resumed> "\260?a-\226Y\341\373\215\321\270\225} F\211\231_u\326\"\360\244\225\321\252\30\0\256\241\251\257"..., 131072, 1928462336) = 131072
13958 23:14:32.024456 writev(3, [{iov_base="\20\0\2\0\0\0\0\0!\221\1\0\0\0\0\0", iov_len=16}, {iov_base="\260?a-\226Y\341\373\215\321\270\225} F\211\231_u\326\"\360\244\225\321\252\30\0\256\241\251\257"..., iov_len=131072}], 2 <unfinished ...>
13959 23:14:32.024645 <... read resumed> "P\0\0\0\17\0\0\0#\221\1\0\0\0\0\08\0\0\0\0\0\0\0m\0\0\0s\0\0\0"..., 1052672) = 80
13958 23:14:32.024713 <... writev resumed> ) = 131088
13959 23:14:32.024749 mmap(NULL, 139264, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
13958 23:14:32.024776 writev(2, [{iov_base="free(): invalid pointer", iov_len=23}, {iov_base="\n", iov_len=1}], 2 <unfinished ...>
13959 23:14:32.024804 <... mmap resumed> ) = 0x7f77324d5000
13958 23:14:32.024813 <... writev resumed> ) = 24
13955 23:14:32.024822 <... pread64 resumed> "\370\336\4x\361\361\265\275\351y\235<j\3\316.\305\271\276B:\304o\304\357\252\366\300x\236\362f"..., 131072, 1928593408) = 131072
13959 23:14:32.024837 pread64(15,  <unfinished ...>
13958 23:14:32.024846 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
13955 23:14:32.024857 writev(3, [{iov_base="\20\0\2\0\0\0\0\0\"\221\1\0\0\0\0\0", iov_len=16}, {iov_base="\370\336\4x\361\361\265\275\351y\235<j\3\316.\305\271\276B:\304o\304\357\252\366\300x\236\362f"..., iov_len=131072}], 2 <unfinished ...>
13958 23:14:32.024917 <... mmap resumed> ) = 0x7f77380a9000
13955 23:14:32.024929 <... writev resumed> ) = 131088
13958 23:14:32.024938 rt_sigprocmask(SIG_UNBLOCK, [ABRT],  <unfinished ...>
13955 23:14:32.024949 munmap(0x7f7738008000, 139264 <unfinished ...>
13958 23:14:32.024976 <... rt_sigprocmask resumed> NULL, 8) = 0
13955 23:14:32.025098 <... munmap resumed> ) = 0
13958 23:14:32.025129 rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1],  <unfinished ...>
13955 23:14:32.025143 read(3,  <unfinished ...>
13958 23:14:32.025153 <... rt_sigprocmask resumed> [HUP INT QUIT TERM], 8) = 0
13954 23:14:32.025178 <... read resumed> "P\0\0\0\17\0\0\0$\221\1\0\0\0\0\08\0\0\0\0\0\0\0m\0\0\0s\0\0\0"..., 1052672) = 80
13958 23:14:32.025193 getpid()          = 13952
13958 23:14:32.025233 gettid()          = 13958
13959 23:14:32.025259 <... pread64 resumed> "\274\16.X\247\325Q\202Y\177H\t\253\0f\275_\375\321\265\301&\27\214bq\212\226M\314\205\23"..., 131072, 1928724480) = 131072
13958 23:14:32.025272 tgkill(13952, 13958, SIGABRT <unfinished ...>
13959 23:14:32.025282 writev(3, [{iov_base="\20\0\2\0\0\0\0\0#\221\1\0\0\0\0\0", iov_len=16}, {iov_base="\274\16.X\247\325Q\202Y\177H\t\253\0f\275_\375\321\265\301&\27\214bq\212\226M\314\205\23"..., iov_len=131072}], 2 <unfinished ...>
13958 23:14:32.025295 <... tgkill resumed> ) = 0
13954 23:14:32.025305 mmap(NULL, 139264, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
13958 23:14:32.025317 rt_sigprocmask(SIG_SETMASK, [HUP INT QUIT TERM],  <unfinished ...>
13959 23:14:32.025332 <... writev resumed> ) = 131088
13958 23:14:32.025342 <... rt_sigprocmask resumed> NULL, 8) = 0
13959 23:14:32.025351 munmap(0x7f77324d5000, 139264 <unfinished ...>
13958 23:14:32.025359 --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=13952, si_uid=0} ---
13954 23:14:32.025369 <... mmap resumed> ) = 0x7f7738008000
13959 23:14:32.025393 <... munmap resumed>) = ?
13957 23:14:32.025402 <... read resumed> <unfinished ...>) = ?
13958 23:14:32.025421 +++ killed by SIGABRT +++
13959 23:14:32.025432 +++ killed by SIGABRT +++
13957 23:14:32.025439 +++ killed by SIGABRT +++
13956 23:14:32.025444 <... read resumed> <unfinished ...>) = ?
13956 23:14:32.025560 +++ killed by SIGABRT +++
13955 23:14:32.025567 <... read resumed> <unfinished ...>) = ?
13955 23:14:32.025632 +++ killed by SIGABRT +++
13952 23:14:32.025645 <... futex resumed>) = ?
13954 23:14:32.025667 +++ killed by SIGABRT +++
13952 23:14:32.026078 +++ killed by SIGABRT +++

[x] gdb backtrace when mergerfs receives SIGABRT:

Thread 2.5 "mergerfs" received signal SIGABRT, Aborted.                                                                                                        [93/518490]
[Switching to Thread 0x7ffff5e9d700 (LWP 15726)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt full
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
    set = {__val = {16391, 0, 1, 140737352867800, 140737319092720,
        93824992294954, 7, 0, 0, 15816913384419514880, 1, 0, 140737319092288,
        93824992780960, 1, 140737319092256}}
    pid = <optimized out>
    tid = <optimized out>
    ret = <optimized out>
#1  0x00007ffff7ada535 in __GI_abort () at abort.c:79
    save_stage = 1
    act = {__sigaction_handler = {sa_handler = 0x7ffff5e941a0,
        sa_sigaction = 0x7ffff5e941a0}, sa_mask = {__val = {0, 93824992780960,
          2, 140737319092640, 93824992780944, 140737319092624, 93824992416360,
          140736817270896, 140737319092592, 140736951487472, 140736951484448,
          4096, 135232, 139264, 140737319092560, 140737319092816}},
      sa_flags = -169262768, sa_restorer = 0x1000}
    sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x00007ffff7b31508 in __libc_message (action=action@entry=do_abort,
fmt=fmt@entry=0x7ffff7c3c28d "%s\n") at ../sysdeps/posix/libc_fatal.c:181
    ap = {{gp_offset = 24, fp_offset = 0, overflow_arg_area = 0x7ffff5e94260,
        reg_save_area = 0x7ffff5e941f0}}
    fd = 2
    list = <optimized out>
    nlist = <optimized out>
    cp = <optimized out>
    written = <optimized out>
#3  0x00007ffff7b37c1a in malloc_printerr (
str=str@entry=0x7ffff7c3a43b "free(): invalid pointer") at malloc.c:5341
No locals.
#4  0x00007ffff7b3942c in _int_free (av=<optimized out>, p=<optimized out>,
have_lock=<optimized out>) at malloc.c:4165
    size = 0
    fb = <optimized out>
    nextchunk = <optimized out>
    nextsize = <optimized out>                                                                                                                             [57/518492]
    nextinuse = <optimized out>
    prevsize = <optimized out>
    bck = <optimized out>
--Type <RET> for more, q to quit, c to continue without paging--
    fwd = <optimized out>
    __PRETTY_FUNCTION__ = "_int_free"
#5  0x0000555555599658 in fuse_send_data_iov_fallback (len=<optimized out>,
buf=<optimized out>, iov_count=2, iov=<optimized out>, ch=<optimized out>,
f=<optimized out>) at lib/fuse_lowlevel.c:482
    mem_buf = {count = 1, idx = 0, off = 0, buf = {{size = 131072,
          flags = (unknown: 0), mem = 0x7ffff437c000, fd = -1, pos = 0}}}
    mbuf = <optimized out>
    res = 0
    mem_buf = <optimized out>
    mbuf = <optimized out>
    res = <optimized out>
#6  fuse_send_data_iov (f=0x5555555db250, ch=0x5555555daad0,
iov=iov@entry=0x7ffff5e94420, iov_count=<optimized out>, iov_count@entry=1,
buf=<optimized out>, flags=flags@entry=8) at lib/fuse_lowlevel.c:758
    res = <optimized out>
    len = <optimized out>
    out = <optimized out>
    llp = <optimized out>
    splice_flags = <optimized out>
    pipesize = <optimized out>
    total_fd_size = <optimized out>
    idx = <optimized out>
    headerlen = <optimized out>
    pipe_buf = {count = 1, idx = 0, off = 0, buf = {{size = 131072,
          flags = (unknown: 0), mem = 0x0, fd = -1, pos = 0}}}
#7  0x000055555559c4d8 in fuse_reply_data (req=req@entry=0x7fffe0006200,
bufv=<optimized out>, flags=flags@entry=FUSE_BUF_SPLICE_MOVE)
at lib/fuse_lowlevel.c:785
    iov = {{iov_base = 0x7ffff5e94410, iov_len = 16}, {
        iov_base = 0x7ffff437c000, iov_len = 131072}}
    out = {len = 131088, error = 0, unique = 729820}                                                                                                       [21/518492]
    res = <optimized out>
#8  0x0000555555593035 in fuse_lib_read (req=0x7fffe0006200, ino=590, size=131072,
off=9980739584, fi=0x7ffff5e944f0) at lib/fuse.c:3282
    f = 0x5555555daf50
    buf = 0x7fffe0008610
--Type <RET> for more, q to quit, c to continue without paging--
    path = 0x0
    res = <optimized out>
#9  0x000055555559b201 in do_read (req=<optimized out>, nodeid=<optimized out>,
inarg=<optimized out>) at lib/fuse_lowlevel.c:1258
    fi = {flags = 294912, writepage = 0, direct_io = 0, keep_cache = 0,
      flush = 0, nonseekable = 0, flock_release = 0, cache_readdir = 0,
      auto_cache = 0, padding = 0, fh = 140737219947136, lock_owner = 0}
    arg = <optimized out>
#10 0x000055555559b7f7 in fuse_ll_process_buf (data=0x5555555db250,
buf=0x7ffff5e94660, ch=<optimized out>) at lib/fuse_lowlevel.c:2556
    f = 0x5555555db250
    bufv = {count = 1, idx = 0, off = 0, buf = {{size = 80,
          flags = (unknown: 0), mem = 0x7ffff5e9e010, fd = 0, pos = 0}}}
    tmpbuf = {count = 1, idx = 0, off = 0, buf = {{size = 80,
          flags = (unknown: 0), mem = 0x0, fd = -1, pos = 0}}}
    in = <optimized out>
    inarg = <optimized out>
    req = 0x7fffe0006200
    mbuf = 0x0
    err = <optimized out>
    res = <optimized out>
#11 0x0000555555598649 in fuse_do_work (data=0x5555555fba40)
at lib/fuse_loop_mt.c:93
    ch = 0x5555555daad0
    fbuf = {size = 80, flags = (unknown: 0), mem = 0x7ffff5e9e010, fd = 0,
      pos = 0}
    res = 80
    w = 0x5555555fba40
    mt = 0x7fffffffe520
#12 0x00007ffff7c80fa3 in start_thread (arg=<optimized out>)
at pthread_create.c:486
    ret = <optimized out>
    pd = <optimized out>
    now = <optimized out>
    unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737319130880,
            1860540964874121567, 140737488347934, 140737488347935,
            140737319130880, 93824992918080, -1860527653457333921,
--Type <RET> for more, q to quit, c to continue without paging--
            -1860523523526594209}, mask_was_saved = 0}}, priv = {pad = {0x0,
          0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
    not_first_call = <optimized out>
#13 0x00007ffff7bb14cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.

Tell me if you need more info or context. Thanks.

trapexit commented 4 years ago

Thank you for the thorough report. Unfortunately this might be tough to track down. Many many people, including myself, are using mergerfs 24/7 under heavy load. I've over 1000 torrents running at 40MB/s+ constantly.

It might be worth changing the makefiles (in main and libfuse dirs) to be -O0 vs -O2. Maybe change some settings. Turn off caching. Maybe enable splice_read / splice_write.

nabnux commented 4 years ago

Thanks for your answer. I know that this may be very specific to my setup.

While rebuilding with -O0 did not help or gave more info in the debug trace, disabling cache with cache.files=off seems to solve the issue, I'm currently force rechecking all my torrents without problems so far. Do you think this option could cause issues with mmap and Deluge, as stated in the FAQ ?

I will also try cache.files=libfuse and fiddling with splice option, and keep you updated.

trapexit commented 4 years ago

gave more info in the debug trace

It should provide more info in the stack trace. A lot is optimized out with O2.

I will also try cache.files=libfuse and fiddling with splice option, and keep you updated.

The default for cache.files is libfuse.

Curious that cache.files=off fixed things. There is very very little code that changes depending on that value. It's mostly a kernel thing. It could be happenstance. I'll scan over that code just in case.

Does Deluge use mmap? I believe I've tried it before and it worked fine on my system with caching disabled. If it does use mmap it must have a fallback to using standard io.

nabnux commented 4 years ago

Quick update: I encountered the issue even with cache.files=off, but it only happened once during the recheck of ~11TB of data so I'm okay with that. Unfortunately I was not running mergerfs under gdb this time so no trace.

Enabling any other value for files cache makes it crash much faster, and splice options did not change that behavior. Neither did switching the IO driver of the VM from virtio to ide.

Does Deluge use mmap? I believe I've tried it before and it worked fine on my system with caching disabled. If it does use mmap it must have a fallback to using standard io.

I confirm that Deluge works well with cache disabled.

I'll paste more useful info here if I get any. Feel free to close this issue anytime.

trapexit commented 4 years ago

I'll leave it open till we figure something out. I realize it might be a PITA but perhaps creating another VM with the same OS install? Or maybe run some RAM tests on the host machine? Over the past few years I've seen some really weird stuff that somehow manifest through mergerfs. Had someone with bad RAM (confirmed with a memtest86)... their system seemed absolutely fine otherwise even under heavy load. Had someone with a bad CPU that similarly would lead to crashes with mergerfs alone. Swapped out with an identical CPU and everything was fine. Not saying it's a hardware problem... but if we are running out of ideas it might be worth it.

It's not to say there isn't a mergerfs bug. It's totally possible that your setup is tickling something and most of us are luckily not. But if it is only your system able to trigger it then I'm probably not going to be able to help much. If you can recreate it in another VM then perhaps you could share it with me.

jeffgt14 commented 4 years ago

I have a similar configuration as you and was getting the same error constantly since I updated to mergerfs 2.29. I upgraded my kernel from 4.19 to 5.24 and it's been running great for a few days now. Not sure what got resolved if anything but that was my solution.

nabnux commented 4 years ago

@jeffgt14 thanks for the feedback, I should have thought about upgrading the kernel. I guess you meant kernel version 5.4 ? I did the upgrade, crossing my fingers now. Can I ask if you're also running mergerfs in a VM ?

@trapexit if that does not solve my issue I'll do a memtest on the host server, and create another VM if needed. CPU swap won't be an option here unfortunately :)

trapexit commented 4 years ago

There is very little that changed between 2.28.3 and 2.29.0. Looking over the code I don't see anything that would lead to this. And I don't think the kernel should be impacting anything. If this is a bug it's probably something that's been there. I'm trying to reproduce this but no luck so far.

trapexit commented 4 years ago

I've a couple instances of find -type f -print -exec dd if={} of=/dev/null bs=1M status=progress \; running against a 2.29.0 instance since last night. Humming away just fine :-/

jeffgt14 commented 4 years ago

@nabnut yes sorry that should say kernel 5.4. I'm not running in a VM so nothing in common there but do have several torrents running and a subsonic server constantly scanning over my filesystem. I wish I could help more debugging anything, I just went straight to updating the kernel because I've been meaning to do it anyways and haven't had any issues since.

mergerfs version: 2.29.0 FUSE library version: 2.9.7-mergerfs_2.29.0 fusermount version: 2.9.7-mergerfs_2.29.0 using FUSE kernel interface version 7.31

uname -a

Linux dingle-server 5.4.24-1-MANJARO #1 SMP PREEMPT Thu Mar 5 20:29:25 UTC 2020 x86_64 GNU/Linux

mergerfs settings:

/mnt/data/disk1:/mnt/data/disk2:/mnt/data/disk3 /mnt/storage fuse.mergerfs defaults,use_ino,allow_other,noforget,cache.files=auto-full,cache.open=1,dropcacheonclose=true,ignorepponrename=true,cache.readdir=true,cache.statfs=60,minfreespace=6G,moveonenospc=true,cache.symlinks=true,fsname=mergerfs,category.create=mfs,func.getattr=newest 0 0

Transmission:

transmission-daemon 2.94

Airsonic:

Version 10.4.0-RELEASE – July 13, 2019 Server Apache Tomcat/8.5.42, java 1.8.0_242, Linux (273.7 MB / 914.5 MB)

I am also running an NFS server as well so I do get stale mounts every once in a while on the client side, but this issue was specifically about errors on the server.

nabnux commented 4 years ago

So with the latest 5.4 kernel available on Debian and cache.files=auto-full mergerfs was only killed once in two weeks, like with the older kernel and cache disabled. Not sure we can get to any conclusion with this.

However I've been bumping into another problem in parallel: sometimes the mergerfs mountpoint gets unresponsive, any access gets stuck forever (for example a simple ls). There are some kernel messages, not sure if related:

[Fri Apr  3 21:17:38 2020] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)                                                                 [15/1917][Fri Apr  3 21:17:38 2020] BUG: unable to handle page fault for address: ffff9099b6cb4b00
[Fri Apr  3 21:17:38 2020] #PF: supervisor instruction fetch in kernel mode          
[Fri Apr  3 21:17:38 2020] #PF: error_code(0x0011) - permissions violation           
[Fri Apr  3 21:17:38 2020] PGD 12b801067 P4D 12b801067 PUD 13b356063 PMD 13675b063 PTE 8000000136cb4163
[Fri Apr  3 21:17:38 2020] Oops: 0011 [#6] SMP NOPTI                                                                                                                      [Fri Apr  3 21:17:38 2020] CPU: 2 PID: 431 Comm: mergerfs Tainted: G      D W         5.4.0-4-amd64 #1 Debian 5.4.19-1
[Fri Apr  3 21:17:38 2020] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[Fri Apr  3 21:17:38 2020] RIP: 0010:0xffff9099b6cb4b00                                                                                                                   
[Fri Apr  3 21:17:38 2020] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00                                                                                                                  
[Fri Apr  3 21:17:38 2020] RSP: 0018:ffffa7c540657c30 EFLAGS: 00010286                                                                                                    
[Fri Apr  3 21:17:38 2020] RAX: ffff9099b6cb4b00 RBX: ffff9099b63da370 RCX: 0000000000000000
[Fri Apr  3 21:17:38 2020] RDX: 0000000000000000 RSI: ffffa7c540b9bcd0 RDI: ffff9099b9cec600           
[Fri Apr  3 21:17:38 2020] RBP: ffff9099b63da360 R08: ffff9099b63da3c0 R09: ffffa7c540657bd0
[Fri Apr  3 21:17:38 2020] R10: 0000000000001000 R11: ffffa7c540b9bd18 R12: ffff9099b9cec600
[Fri Apr  3 21:17:38 2020] R13: ffff9099b6cb4b00 R14: ffff9099b9cbcec0 R15: ffff9099b63da360
[Fri Apr  3 21:17:38 2020] FS:  00007fbbaeb05700(0000) GS:ffff9099bba80000(0000) knlGS:0000000000000000
[Fri Apr  3 21:17:38 2020] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033         
[Fri Apr  3 21:17:38 2020] CR2: ffff9099b6cb4b00 CR3: 0000000138cf2000 CR4: 00000000003406e0
[Fri Apr  3 21:17:38 2020] Call Trace:
[Fri Apr  3 21:17:38 2020]  ? fuse_request_end+0xbc/0x1f0 [fuse]                     
[Fri Apr  3 21:17:38 2020]  ? fuse_dev_do_write+0x25e/0xde0 [fuse]                   
[Fri Apr  3 21:17:38 2020]  ? ext4_da_write_end+0xbe/0x2d0 [ext4]                    
[Fri Apr  3 21:17:38 2020]  ? copyin+0x28/0x30                                       
[Fri Apr  3 21:17:38 2020]  ? iov_iter_copy_from_user_atomic+0xc3/0x370              
[Fri Apr  3 21:17:38 2020]  ? fuse_dev_write+0x53/0x90 [fuse]                        
[Fri Apr  3 21:17:38 2020]  ? do_iter_readv_writev+0x158/0x1d0                       
[Fri Apr  3 21:17:38 2020]  ? do_iter_write+0x7d/0x190                               
[Fri Apr  3 21:17:38 2020]  ? vfs_writev+0xa6/0xf0                                   
[Fri Apr  3 21:17:38 2020]  ? do_writev+0x6b/0x110                                   
[Fri Apr  3 21:17:38 2020]  ? do_syscall_64+0x52/0x160                               
[Fri Apr  3 21:17:38 2020]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9               
[Fri Apr  3 21:17:38 2020] Modules linked in: kvm_amd ccp rng_core kvm irqbypass crct10dif_pclmul crc32_pclmul nft_ct nf_conntrack ghash_clmulni_intel nf_defrag_ipv6 nf_d
efrag_ipv4 libcrc32c fuse aesni_intel nft_counter crypto_simd joydev virtio_balloon evdev cryptd glue_helper serio_raw pcspkr button qemu_fw_cfg nf_tables nfnetlink sunrp
c ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic dm_mod ata_generic virtio_blk e1000 psmouse ehci_pci uhci_hcd ehci_hcd ata_piix libata i2c_piix4 usbco
re crc32c_intel scsi_mod virtio_pci virtio_ring usb_common virtio floppy
[Fri Apr  3 21:17:38 2020] CR2: ffff9099b6cb4b00
[Fri Apr  3 21:17:38 2020] ---[ end trace f5fa055ba08acd39 ]---
[Fri Apr  3 21:17:38 2020] RIP: 0010:0xffff9099b6cb1e00
[Fri Apr  3 21:17:38 2020] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00
[Fri Apr  3 21:17:38 2020] RSP: 0018:ffffa7c54065fc30 EFLAGS: 00010286
[Fri Apr  3 21:17:38 2020] RAX: ffff9099b6cb1e00 RBX: ffff9099b315e880 RCX: 0000000000000000
[Fri Apr  3 21:17:38 2020] RDX: 0000000000000000 RSI: ffffa7c540b83cd0 RDI: ffff9099b9cec600
[Fri Apr  3 21:17:38 2020] RBP: ffff9099b315e870 R08: ffff9099b315e8d0 R09: ffffa7c54065fbd0
[Fri Apr  3 21:17:38 2020] R10: 0000000000001000 R11: ffffa7c540b83d18 R12: ffff9099b9cec600
[Fri Apr  3 21:17:38 2020] R13: ffff9099b6cb1e00 R14: ffff9099b9cbcec0 R15: ffff9099b315e870
[Fri Apr  3 21:17:38 2020] FS:  00007fbbaeb05700(0000) GS:ffff9099bba80000(0000) knlGS:0000000000000000
[Fri Apr  3 21:17:38 2020] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Fri Apr  3 21:17:38 2020] CR2: ffff9099b6cb4b00 CR3: 0000000138cf2000 CR4: 00000000003406e0

mergerfs process:

# ps auxww | grep merger
root       425  8.6  0.3 457092 15296 ?        S<s  Apr02 279:05 mergerfs /mnt/data/vd* /home/deluge -o rw,allow_other,use_ino,cache.files=auto-full,func.getattr=newest,dropcacheonclose=true,fsname=mergerfs,dev,suid

strace last line (don't have the rest):

# strace -p 425
strace: Process 425 attached
futex(0x7ffd12c88fb0, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY

(I didn't do the memtest yet because I'm lazy :expressionless: )

trapexit commented 4 years ago

Sounds like a kernel bug. There really isn't much I can do but perhaps report it to the kernel maintainer.

On 4/3/2020 11:50 PM, nabnut wrote:

So with the latest 5.4 kernel available on Debian and |cache.files=auto-full| mergerfs was only killed once in two weeks, like with the older kernel and cache disabled. Not sure we can get to any conclusion with this.

However I've been bumping into another problem in parallel: sometimes the mergerfs mountpoint gets unresponsive, any access gets stuck forever (for example a simple |ls|). There are some kernel messages, not sure if related:

|[Fri Apr 3 21:17:38 2020] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [15/1917][Fri Apr 3 21:17:38 2020] BUG: unable to handle page fault for address: ffff9099b6cb4b00 [Fri Apr 3 21:17:38 2020] #PF: supervisor instruction fetch in kernel mode [Fri Apr 3 21:17:38 2020] #PF: error_code(0x0011) - permissions violation [Fri Apr 3 21:17:38 2020] PGD 12b801067 P4D 12b801067 PUD 13b356063 PMD 13675b063 PTE 8000000136cb4163 [Fri Apr 3 21:17:38 2020] Oops: 0011 [#6] SMP NOPTI [Fri Apr 3 21:17:38 2020] CPU: 2 PID: 431 Comm: mergerfs Tainted: G D W 5.4.0-4-amd64 #1 Debian 5.4.19-1 [Fri Apr 3 21:17:38 2020] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [Fri Apr 3 21:17:38 2020] RIP: 0010:0xffff9099b6cb4b00 [Fri Apr 3 21:17:38 2020] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 [Fri Apr 3 21:17:38 2020] RSP: 0018:ffffa7c540657c30 EFLAGS: 00010286 [Fri Apr 3 21:17:38 2020] RAX: ffff9099b6cb4b00 RBX: ffff9099b63da370 RCX: 0000000000000000 [Fri Apr 3 21:17:38 2020] RDX: 0000000000000000 RSI: ffffa7c540b9bcd0 RDI: ffff9099b9cec600 [Fri Apr 3 21:17:38 2020] RBP: ffff9099b63da360 R08: ffff9099b63da3c0 R09: ffffa7c540657bd0 [Fri Apr 3 21:17:38 2020] R10: 0000000000001000 R11: ffffa7c540b9bd18 R12: ffff9099b9cec600 [Fri Apr 3 21:17:38 2020] R13: ffff9099b6cb4b00 R14: ffff9099b9cbcec0 R15: ffff9099b63da360 [Fri Apr 3 21:17:38 2020] FS: 00007fbbaeb05700(0000) GS:ffff9099bba80000(0000) knlGS:0000000000000000 [Fri Apr 3 21:17:38 2020] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Fri Apr 3 21:17:38 2020] CR2: ffff9099b6cb4b00 CR3: 0000000138cf2000 CR4: 00000000003406e0 [Fri Apr 3 21:17:38 2020] Call Trace: [Fri Apr 3 21:17:38 2020] ? fuse_request_end+0xbc/0x1f0 [fuse] [Fri Apr 3 21:17:38 2020] ? fuse_dev_do_write+0x25e/0xde0 [fuse] [Fri Apr 3 21:17:38 2020] ? ext4_da_write_end+0xbe/0x2d0 [ext4] [Fri Apr 3 21:17:38 2020] ? copyin+0x28/0x30 [Fri Apr 3 21:17:38 2020] ? iov_iter_copy_from_user_atomic+0xc3/0x370 [Fri Apr 3 21:17:38 2020] ? fuse_dev_write+0x53/0x90 [fuse] [Fri Apr 3 21:17:38 2020] ? do_iter_readv_writev+0x158/0x1d0 [Fri Apr 3 21:17:38 2020] ? do_iter_write+0x7d/0x190 [Fri Apr 3 21:17:38 2020] ? vfs_writev+0xa6/0xf0 [Fri Apr 3 21:17:38 2020] ? do_writev+0x6b/0x110 [Fri Apr 3 21:17:38 2020] ? do_syscall_64+0x52/0x160 [Fri Apr 3 21:17:38 2020] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fri Apr 3 21:17:38 2020] Modules linked in: kvm_amd ccp rng_core kvm irqbypass crct10dif_pclmul crc32_pclmul nft_ct nf_conntrack ghash_clmulni_intel nf_defrag_ipv6 nf_d efrag_ipv4 libcrc32c fuse aesni_intel nft_counter crypto_simd joydev virtio_balloon evdev cryptd glue_helper serio_raw pcspkr button qemu_fw_cfg nf_tables nfnetlink sunrp c ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic dm_mod ata_generic virtio_blk e1000 psmouse ehci_pci uhci_hcd ehci_hcd ata_piix libata i2c_piix4 usbco re crc32c_intel scsi_mod virtio_pci virtio_ring usb_common virtio floppy [Fri Apr 3 21:17:38 2020] CR2: ffff9099b6cb4b00 [Fri Apr 3 21:17:38 2020] ---[ end trace f5fa055ba08acd39 ]--- [Fri Apr 3 21:17:38 2020] RIP: 0010:0xffff9099b6cb1e00 [Fri Apr 3 21:17:38 2020] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 [Fri Apr 3 21:17:38 2020] RSP: 0018:ffffa7c54065fc30 EFLAGS: 00010286 [Fri Apr 3 21:17:38 2020] RAX: ffff9099b6cb1e00 RBX: ffff9099b315e880 RCX: 0000000000000000 [Fri Apr 3 21:17:38 2020] RDX: 0000000000000000 RSI: ffffa7c540b83cd0 RDI: ffff9099b9cec600 [Fri Apr 3 21:17:38 2020] RBP: ffff9099b315e870 R08: ffff9099b315e8d0 R09: ffffa7c54065fbd0 [Fri Apr 3 21:17:38 2020] R10: 0000000000001000 R11: ffffa7c540b83d18 R12: ffff9099b9cec600 [Fri Apr 3 21:17:38 2020] R13: ffff9099b6cb1e00 R14: ffff9099b9cbcec0 R15: ffff9099b315e870 [Fri Apr 3 21:17:38 2020] FS: 00007fbbaeb05700(0000) GS:ffff9099bba80000(0000) knlGS:0000000000000000 [Fri Apr 3 21:17:38 2020] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Fri Apr 3 21:17:38 2020] CR2: ffff9099b6cb4b00 CR3: 0000000138cf2000 CR4: 00000000003406e0 |

mergerfs process:

# ps auxww grep merger root 425 8.6 0.3 457092 15296 ? S<s Apr02 279:05 mergerfs /mnt/data/vd* /home/deluge -o rw,allow_other,use_ino,cache.files=auto-full,func.getattr=newest,dropcacheonclose=true,fsname=mergerfs,dev,suid

strace last line (don't have the rest):

|# strace -p 425 strace: Process 425 attached futex(0x7ffd12c88fb0, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY |

(I didn't do the memtest yet because I'm lazy 😑 )

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/trapexit/mergerfs/issues/727#issuecomment-608967581, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABQILF4L75UYILR3UAM57TRK2U77ANCNFSM4LFJROEQ.

trapexit commented 4 years ago

Please create a separate ticket if unrelated to this one.

Create a minimal reproducible situation? That's far too complex to make any other suggestions. Recent kernels have some known problems. Maybe start there.

taz-007 commented 4 years ago

However I've been bumping into another problem in parallel: sometimes the mergerfs mountpoint gets unresponsive, any access gets stuck forever (for example a simple ls). There are some kernel messages, not sure if related:

About the freeze, might want to take a look at https://github.com/trapexit/mergerfs/issues/708 or https://bugzilla.kernel.org/show_bug.cgi?id=206643 .

alpaca1thunder commented 4 years ago

Not sure how much help this is, but I had similar problems with openSUSE leap 15.1, which runs a similar kernel to Debian Buster. After trying a variety of solutions, I upgraded the kernel to 5.7.0-rc6 and haven't had any issues since, even after testing it under a variety of heavy loads. So I'm guessing it's a kernel issue as well.

alpaca1thunder commented 4 years ago

Not sure how much help this is, but I had similar problems with openSUSE leap 15.1, which runs a similar kernel to Debian Buster. After trying a variety of solutions, I upgraded the kernel to 5.7.0-rc6 and haven't had any issues since, even after testing it under a variety of heavy loads. So I'm guessing it's a kernel issue as well.

Sorry for multiple posts, but I spoke too soon unfortunately, it seemed to crash right after--under a not so particularly high load. I can't seem to reproduce it, the only things that I remember it having in common is that I was fiddling with my machines in KVM at the time, but it (shouldn't) be related at all because they aren't even touching the disks. Haven't found anything relevant in journalctl or dmesg either. I use my mergerFS volume mainly with Docker containers if that makes a difference.

Here's my current fstab entry:

/mnt/M01:/mnt/M02:/mnt/M03:/mnt/M04:/mnt/M05:/mnt/M06:/mnt/M07:/mnt/M08:/mnt/M09:/mnt/M10:/mnt/M11:/mnt/M12 /media/mergerfs/media  fuse.mergerfs defaults,func.getattr=newest,allow_other,use_ino,category.create=mfs,cache.files=partial,dropcacheonclose=true,moveonenospc=true,minfreespace=10G,fsname=mergerFS 0 0

I planned on doing a clean install today anyway, @trapexit , is there anything you suggest doing to get your some more detailed logs in the future? Sorry if it's already in your documentation somewhere. I'll compile from source from the latest git commit.

...Or maybe run some RAM tests on the host machine? Over the past few years I've seen some really weird stuff that somehow manifest through mergerfs. Had someone with bad RAM (confirmed with a memtest86)...

I'll try memtest86 as well, and maybe I can contribute to the docs somehow about debugging hardware before for people with similar problems. I'm pretty determined to find out what's causing this.

trapexit commented 4 years ago

Can try a few things. Depends on your skill level. Easiest is to try earlier versions to see if anything changes. Otherwise build with debugging symbols and run it in gdb to catch where it crashes.

trapexit commented 4 years ago

If it can be replicated in a VM then I can do some testing myself. The problem has been that I haven't been able to reproduce this yet.

trapexit commented 4 years ago

I've got an Ubuntu 20.04 server VM up. Using mergerfs 2.29.0 with debugging enabled (changed optimizations to O0 and -g in make file; i made this easier recently but I want to test the latest release).

In gdb I ran run -f -o use_ino,direct_io /home/user /tmp/test

In another terminal I've got: while true; do dd if=/dev/urandom of=/tmp/test/blah bs=1M count=1024 status=progress; done

Been running for several minutes without issue. Will report back if that changes. If someone could try doing similar on their machine and if it triggers it replicate that in a VM so I could try to replicate?

alpaca1thunder commented 4 years ago

Can try a few things. Depends on your skill level. Easiest is to try earlier versions to see if anything changes. Otherwise build with debugging symbols and run it in gdb to catch where it crashes.

Testing it with a clean install of Debian Buster with mergerfs version: 2.24.2 right now, (the one in the default repository) its been good for 12 hours or so. Rechecking some data from when it crashed last time, seems to be okay so far. I'll rebuild & run it with gdb and post the logs if/when it crashes. Thanks for replying!

trapexit commented 4 years ago

Ran all night. No luck crashing on Ubuntu 20.04.

alpaca1thunder commented 4 years ago

Ran all night. No luck crashing on Ubuntu 20.04.

Been running for a week with the older debian version, no issues. Glad to have it working well, but sorry I couldn't be of more help. I'll try and fire up a VM with a similar setup at some point and try and reproduce it for you.

trapexit commented 4 years ago

Closing this for now.

trapexit / mergerfs

mergerfs gets killed on heavy reads #727

General description

Expected behavior

Actual behavior

Precise steps to reproduce the behavior

System information