trapexit / mergerfs

a featureful union filesystem
http://spawn.link
Other
4.04k stars 168 forks source link

Core Dump during normal operation #1266

Closed AnyTimeTraveler closed 8 months ago

AnyTimeTraveler commented 8 months ago

Describe the bug

My setup:

Fstab line:

/mnt/18tb* /mnt/megadrive fuse.mergerfs defaults,nonempty,allow_other,use_ino,cache.files=off,moveonenospc=true,category.create=mfs,dropcacheonclose=true,minfreespace=250G,fsname=mergerfs 0 2

Until now, all services have been running smoothly. I was moving some files via NFS when I got an IO error. When I ssh'ed into the server and tried to ls /mnt, I got the error:

Transport endpoint is not connected (os error 107)

I tried unmounting the merged directory, but it told me that the target is busy.

Unfortunately, I had time pressure, so I rebooted in hope that it would fix it, which it did.

I went back later and looked at my systemd logs and found the core dump:

Systemd log ``` Oct 15 16:21:24 megadrive kernel: mount.fuse.merg[516]: segfault at 10 ip 0000000000465751 sp 00007f859bb0f560 error 4 in mergerfs[408000+75000] likely on CPU 2 (core 2, socket 0) Oct 15 16:21:24 megadrive kernel: Code: fe ff ff 49 83 fe 01 74 2f 48 8d 5d 60 45 31 ff 48 89 df e8 d1 34 fa ff 4c 89 f6 48 89 ef e8 26 c9 ff ff 48 89 df 48 8b 40 20 <4c> 8b 70 10 e8 16 2f fa ff e9 08 fe ff ff be fe ff ff ff 4c 89 e7 Oct 15 16:21:24 megadrive systemd[1]: Created slice Slice /system/systemd-coredump. Oct 15 16:21:24 megadrive systemd[1]: Started Process Core Dump (PID 429196/UID 0). Oct 15 16:21:24 megadrive systemd-coredump[429197]: Process 514 (mount.fuse.merg) of user 0 dumped core. Module libgcc_s.so.1 without build-id. Module libstdc++.so.6 without build-id. Module mergerfs without build-id. Stack trace of thread 516: #0 0x0000000000465751 fuse_lib_lookup (mergerfs + 0x65751) #1 0x0000000000472f2d _ZL19process_msgbuf_syncP18fuse_worker_data_tP13fuse_msgbuf_t (mergerfs + 0x72f2d) #2 0x0000000000473096 _ZL12fuse_do_workPv (mergerfs + 0x73096) #3 0x00007f859c49fe24 start_thread (libc.so.6 + 0x85e24) #4 0x00007f859c5219b0 __clone3 (libc.so.6 + 0x1079b0) Stack trace of thread 515: #0 0x00007f859c4e7245 clock_nanosleep@GLIBC_2.2.5 (libc.so.6 + 0xcd245) #1 0x00007f859c4ebd67 __nanosleep (libc.so.6 + 0xd1d67) #2 0x00007f859c4ebc9e sleep (libc.so.6 + 0xd1c9e) #3 0x0000000000466170 fuse_maintenance_loop (mergerfs + 0x66170) #4 0x00007f859c49fe24 start_thread (libc.so.6 + 0x85e24) #5 0x00007f859c5219b0 __clone3 (libc.so.6 + 0x1079b0) Stack trace of thread 519: #0 0x00007f859c51074c read (libc.so.6 + 0xf674c) #1 0x0000000000467992 fuse_ll_buf_receive_read (mergerfs + 0x67992) #2 0x0000000000472fd7 _ZL12fuse_do_workPv (mergerfs + 0x72fd7) #3 0x00007f859c49fe24 start_thread (libc.so.6 + 0x85e24) #4 0x00007f859c5219b0 __clone3 (libc.so.6 + 0x1079b0) Stack trace of thread 520: #0 0x00007f859c51074c read (libc.so.6 + 0xf674c) #1 0x0000000000467992 fuse_ll_buf_receive_read (mergerfs + 0x67992) #2 0x0000000000472fd7 _ZL12fuse_do_workPv (mergerfs + 0x72fd7) #3 0x00007f859c49fe24 start_thread (libc.so.6 + 0x85e24) #4 0x00007f859c5219b0 __clone3 (libc.so.6 + 0x1079b0) Stack trace of thread 514: #0 0x00007f859c49ca36 __futex_abstimed_wait_common (libc.so.6 + 0x82a36) #1 0x00007f859c4a7ac0 __new_sem_wait_slow64.constprop.0 (libc.so.6 + 0x8dac0) #2 0x0000000000473f62 fuse_session_loop_mt (mergerfs + 0x73f62) #3 0x00000000004794b3 fuse_loop_mt (mergerfs + 0x794b3) #4 0x000000000046b8ea fuse_main_real (mergerfs + 0x6b8ea) #5 0x000000000040e1ba _ZN1l4mainEiPPc (mergerfs + 0xe1ba) #6 0x00007f859c43dace __libc_start_call_main (libc.so.6 + 0x23ace) #7 0x00007f859c43db89 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x23b89) #8 0x0000000000411165 _start (mergerfs + 0x11165) Stack trace of thread 517: #0 0x00007f859c51074c read (libc.so.6 + 0xf674c) #1 0x0000000000467992 fuse_ll_buf_receive_read (mergerfs + 0x67992) #2 0x0000000000472fd7 _ZL12fuse_do_workPv (mergerfs + 0x72fd7) #3 0x00007f859c49fe24 start_thread (libc.so.6 + 0x85e24) #4 0x00007f859c5219b0 __clone3 (libc.so.6 + 0x1079b0) ELF object binary architecture: AMD x86-64 Oct 15 16:21:24 megadrive systemd[1]: systemd-coredump@0-429196-0.service: Deactivated successfully. Oct 15 16:21:24 megadrive kernel: ------------[ cut here ]------------ Oct 15 16:21:24 megadrive kernel: nfsd: non-standard errno: -107 Oct 15 16:21:24 megadrive kernel: WARNING: CPU: 1 PID: 1559 at fs/nfsd/nfsproc.c:909 nfserrno+0x52/0x60 [nfsd] Oct 15 16:21:24 megadrive kernel: Modules linked in: xt_connmark xt_mark iptable_mangle xt_comment iptable_raw tls xt_tcpudp xt_conntrack nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables nfnetlink overlay af_packet wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel cfg80211 rfkill 8021q msr nls_iso8859_1 nls_cp437 vfat fat snd_hda_codec_hdmi intel_rapl_msr intel_rapl_common snd_hda_codec_realtek sch_fq_codel snd_hda_codec_generic intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp ledtrig_audio led_class coretemp crc32_pclmul polyval_clmulni polyval_generic atkbd gf128mul libps2 serio iTCO_wdt vivaldi_fmap ghash_clmulni_intel intel_pmc_bxt loop cpufreq_powersave i915 watchdog sha512_ssse3 drm_buddy sha512_generic snd_hda_intel ttm xfs ch341 mei_pxp ee1004 mei_hdcp mfd_core drm_display_helper zfs(PO) evdev usbserial mxm_wmi mac_hid aesni_intel snd_intel_dspcfg e1000e Oct 15 16:21:24 megadrive kernel: cec libaes crypto_simd snd_intel_sdw_acpi cryptd snd_hda_codec rapl intel_cstate ptp drm_kms_helper mei_me pps_core intel_uncore i2c_i801 snd_hda_core mei intel_gtt snd_hwdep i2c_smbus agpgart intel_pch_thermal zunicode(PO) i2c_algo_bit zzstd(O) fb_sys_fops syscopyarea video sysfillrect tiny_power_button acpi_pad zlua(O) sysimgblt wmi edac_core intel_pmc_core zavl(PO) thermal button fan icp(PO) zcommon(PO) znvpair(PO) spl(O) xt_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c br_netfilter veth tun tap macvlan bridge stp llc snd_aloop snd_pcm snd_timer snd soundcore kvm_intel kvm nfsd drm auth_rpcgss nfs_acl lockd grace irqbypass fuse backlight sunrpc i2c_core deflate efi_pstore configfs efivarfs dmi_sysfs ip_tables x_tables autofs4 ext4 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid hid sd_mod t10_pi crc64_rocksoft crc64 crc_t10dif crct10dif_generic ahci xhci_pci libahci xhci_pci_renesas xhci_hcd libata usbcore scsi_mod usb_common crct10dif_pclmul Oct 15 16:21:24 megadrive kernel: crct10dif_common crc32c_intel scsi_common rtc_cmos dm_mod dax Oct 15 16:21:24 megadrive kernel: CPU: 1 PID: 1559 Comm: nfsd Tainted: P O 6.1.55 #1-NixOS Oct 15 16:21:24 megadrive kernel: Hardware name: MSI MS-7979/B150M Night Elf (MS-7979), BIOS 1.D0 06/16/2018 Oct 15 16:21:24 megadrive kernel: RIP: 0010:nfserrno+0x52/0x60 [nfsd] Oct 15 16:21:24 megadrive kernel: Code: cc cc 80 3d 47 14 07 00 00 74 0a b8 00 00 00 05 c3 cc cc cc cc 89 fe 48 c7 c7 4f e4 ba c0 c6 05 2b 14 07 00 01 e8 0e 25 52 d6 <0f> 0b eb dd 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57 b9 Oct 15 16:21:24 megadrive kernel: RSP: 0018:ffffb9ef83e03de0 EFLAGS: 00010282 Oct 15 16:21:24 megadrive kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 Oct 15 16:21:24 megadrive kernel: RDX: 0000000000000002 RSI: ffffffff980c6c59 RDI: 00000000ffffffff Oct 15 16:21:24 megadrive kernel: RBP: ffff9c9319000000 R08: 0000000000000000 R09: ffffb9ef83e03c68 Oct 15 16:21:24 megadrive kernel: R10: 0000000000000003 R11: ffffffff98739e08 R12: ffff9c931e878028 Oct 15 16:21:24 megadrive kernel: R13: ffff9c9454f080a8 R14: 000000000000000b R15: ffff9c931e878028 Oct 15 16:21:24 megadrive kernel: FS: 0000000000000000(0000) GS:ffff9c965fc80000(0000) knlGS:0000000000000000 Oct 15 16:21:24 megadrive kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 15 16:21:24 megadrive kernel: CR2: 00007fce90b1beb8 CR3: 00000001423f8001 CR4: 00000000003706e0 Oct 15 16:21:24 megadrive kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Oct 15 16:21:24 megadrive kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Oct 15 16:21:24 megadrive kernel: Call Trace: Oct 15 16:21:24 megadrive kernel: Oct 15 16:21:24 megadrive kernel: ? __warn+0x7d/0xc0 Oct 15 16:21:24 megadrive kernel: ? nfserrno+0x52/0x60 [nfsd] Oct 15 16:21:24 megadrive kernel: ? report_bug+0xe6/0x170 Oct 15 16:21:24 megadrive kernel: ? console_unlock+0x17f/0x1d0 Oct 15 16:21:24 megadrive kernel: ? handle_bug+0x41/0x70 Oct 15 16:21:24 megadrive kernel: ? exc_invalid_op+0x13/0x60 Oct 15 16:21:24 megadrive kernel: ? asm_exc_invalid_op+0x16/0x20 Oct 15 16:21:24 megadrive kernel: ? nfserrno+0x52/0x60 [nfsd] Oct 15 16:21:24 megadrive kernel: nfsd_lookup+0x9a/0x150 [nfsd] Oct 15 16:21:24 megadrive kernel: nfsd4_proc_compound+0x352/0x660 [nfsd] Oct 15 16:21:24 megadrive kernel: nfsd_dispatch+0x167/0x280 [nfsd] Oct 15 16:21:24 megadrive kernel: svc_process_common+0x286/0x5e0 [sunrpc] Oct 15 16:21:24 megadrive kernel: ? svc_recv+0x4e1/0x890 [sunrpc] Oct 15 16:21:24 megadrive kernel: ? nfsd_svc+0x360/0x360 [nfsd] Oct 15 16:21:24 megadrive kernel: ? nfsd_shutdown_threads+0x90/0x90 [nfsd] Oct 15 16:21:24 megadrive kernel: svc_process+0xad/0x100 [sunrpc] Oct 15 16:21:24 megadrive kernel: nfsd+0xd5/0x190 [nfsd] Oct 15 16:21:24 megadrive kernel: kthread+0xe6/0x110 Oct 15 16:21:24 megadrive kernel: ? kthread_complete_and_exit+0x20/0x20 Oct 15 16:21:24 megadrive kernel: ret_from_fork+0x1f/0x30 Oct 15 16:21:24 megadrive kernel: Oct 15 16:21:24 megadrive kernel: ---[ end trace 0000000000000000 ]--- ```

To Reproduce

It was just normal usage, nothing I haven't done a hundred times before. Moving a few gigs of files between two PCmanFM windows. I don't know the exact moment it crashed, though.

Sorry for the poor description. If it happens again, I will amend.

Expected behavior

MergerFS not to suddenly Coredump :)

System information:

trapexit commented 8 months ago

merged with mergerfs 2.35.1 to /mnt/megadrive

Please test with the latest release.

AnyTimeTraveler commented 8 months ago

Can't reproduce. Though also couldn't reproduce with the old version. Will reopen, if I can.