Closed scottlaird closed 1 year ago
Type | Version/Name |
---|---|
Distribution Name | openSUSE Leap |
Distribution Version | 15.3 |
Kernel Version | kernel-default-5.3.18-59.34.1 |
Architecture | x86_64 |
OpenZFS Version | zfs-2.1.2-1.4 + zfs-kmp-default-2.1.1_k5.3.18_59.34-lp153.1.25 |
I recently ran into what I believe is the same problem with a pool that was created with opensolaris in ~2012. Attempting to access any files in a specific directory would trigger a segmentation fault on the first attempt and an unkillable process on the second and subsequent attempts. This would log the backtrace listed below. A zpool scrub
does not show any errors. A zfs send | zfs recv
to a new filesystem does not fix this problem.
This problem seems very similar to #7910. I compiled a new version of zfs.ko
with the patch for zfs_acl.c
listed in #7910; however, it did not prevent this problem from occurring, or log the kernel message added by the patch. I did not spend a lot of time investigating why this patch wasn't effective.
Attempt to read, copy, or modify any files in the affected directory using ls
, cp
, tar
, rsync
, mv
, getfacl
, setfacl
, etc.
Jan 01 13:55:27 workstation kernel: ------------[ cut here ]------------
Jan 01 13:55:28 workstation kernel: kernel BUG at ../lib/string.c:1090!
Jan 01 13:55:28 workstation kernel: invalid opcode: 0000 [#1] SMP PTI
Jan 01 13:55:28 workstation kernel: CPU: 1 PID: 2999 Comm: updatedb Tainted: P OE N 5.3.18-59.37-default #1 SLE15-SP3
Jan 01 13:55:28 workstation kernel: Hardware name: Supermicro C2SBX/C2SBX, BIOS 1.2a 12/19/2008
Jan 01 13:55:28 workstation kernel: RIP: 0010:fortify_panic+0xf/0x12
Jan 01 13:55:28 workstation kernel: Code: c5 48 89 c2 e8 b4 a2 00 00 c6 04 2b 00 4c 89 e8 5b 5d 41 5c 41 5d 41 5e c3 0f 0b 48 89 fe 48 c7 c7 a8 bb fe aa e8 e1 16 80 ff <0f> 0b 90 48 89 f8 48 89 f7 31 f6 48 3b 3f 53 48 89 c1 bb 01 00 00
Jan 01 13:55:28 workstation kernel: RSP: 0018:ffffb82d437cb950 EFLAGS: 00010282
Jan 01 13:55:28 workstation kernel: RAX: 0000000000000023 RBX: 0000000000000000 RCX: 0000000000000000
Jan 01 13:55:28 workstation kernel: RDX: 0000000000000000 RSI: ffff8f1779a99558 RDI: ffff8f1779a99558
Jan 01 13:55:28 workstation kernel: RBP: 0000000000000001 R08: 000000000000041b R09: 0000000000000032
Jan 01 13:55:28 workstation kernel: R10: 0000000000000000 R11: ffffb82d437cb7f8 R12: ffffb82d437cba48
Jan 01 13:55:28 workstation kernel: R13: ffff8f16fd4433c0 R14: 0000000000000068 R15: ffff8f16fc956600
Jan 01 13:55:28 workstation kernel: FS: 00007f17a645e580(0000) GS:ffff8f1779a80000(0000) knlGS:0000000000000000
Jan 01 13:55:28 workstation kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 01 13:55:28 workstation kernel: CR2: 00007f97969933c0 CR3: 00000001aded0000 CR4: 00000000000406e0
Jan 01 13:55:28 workstation kernel: Call Trace:
Jan 01 13:55:28 workstation kernel: zfs_acl_node_read+0x313/0x320 [zfs]
Jan 01 13:55:28 workstation kernel: zfs_zaccess_aces_check+0x98/0x370 [zfs]
Jan 01 13:55:28 workstation kernel: zfs_zaccess+0xd7/0x3f0 [zfs]
Jan 01 13:55:28 workstation kernel: zfs_lookup+0x1c7/0x3f0 [zfs]
Jan 01 13:55:28 workstation kernel: zpl_lookup+0xc6/0x1e0 [zfs]
Jan 01 13:55:28 workstation kernel: ? multilist_insert+0x83/0xc0 [zfs]
Jan 01 13:55:28 workstation kernel: __lookup_slow+0x97/0x150
Jan 01 13:55:28 workstation kernel: lookup_slow+0x35/0x50
Jan 01 13:55:28 workstation kernel: walk_component+0x1c4/0x300
Jan 01 13:55:28 workstation kernel: ? link_path_walk.part.33+0x68/0x510
Jan 01 13:55:28 workstation kernel: ? rrw_exit+0x61/0x150 [zfs]
Jan 01 13:55:28 workstation kernel: path_lookupat+0x6e/0x210
Jan 01 13:55:28 workstation kernel: filename_lookup+0xb6/0x190
Jan 01 13:55:28 workstation kernel: ? kmem_cache_alloc+0x18a/0x270
Jan 01 13:55:28 workstation kernel: ? getname_flags+0x66/0x1d0
Jan 01 13:55:28 workstation kernel: ? vfs_statx+0x73/0xe0
Jan 01 13:55:28 workstation kernel: vfs_statx+0x73/0xe0
Jan 01 13:55:28 workstation kernel: __do_sys_newlstat+0x39/0x70
Jan 01 13:55:28 workstation kernel: ? _cond_resched+0x15/0x40
Jan 01 13:55:28 workstation kernel: ? exit_to_usermode_loop+0xc5/0x120
Jan 01 13:55:28 workstation kernel: do_syscall_64+0x5b/0x1e0
Jan 01 13:55:28 workstation kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jan 01 13:55:28 workstation kernel: RIP: 0033:0x7f17a5f7c135
Jan 01 13:55:28 workstation kernel: Code: 61 dd 2d 00 64 c7 00 16 00 00 00 b8 ff ff ff ff c3 0f 1f 40 00 83 ff 01 48 89 f0 77 30 48 89 c7 48 89 d6 b8 06 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 03 f3 c3 90 48 8b 15 29 dd 2d 00 f7 d8 64 89
Jan 01 13:55:28 workstation kernel: RSP: 002b:00007ffd99510688 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
Jan 01 13:55:28 workstation kernel: RAX: ffffffffffffffda RBX: 00005587bb443659 RCX: 00007f17a5f7c135
Jan 01 13:55:28 workstation kernel: RDX: 00007ffd99510700 RSI: 00007ffd99510700 RDI: 00005587bb443659
Jan 01 13:55:28 workstation kernel: RBP: 00005587bb42d680 R08: 000000000000ffff R09: 0000000000000000
Jan 01 13:55:28 workstation kernel: R10: 00007f17a5fcaf20 R11: 0000000000000246 R12: 00007ffd995108e0
Jan 01 13:55:28 workstation kernel: R13: 0000000000000005 R14: 0000000000000004 R15: 0000000000000005
Jan 01 13:55:28 workstation kernel: Modules linked in: bnep ppdev parport_pc parport vmw_vsock_vmci_transport vsock vmw_vmci af_packet nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_mangle iptable_raw iptable_security iscsi_ibft iscsi_boot_sysfs ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables bpfilter bluetooth ecdh_generic rfkill ecc vboxnetadp(OEN) vboxnetflt(OEN) vboxdrv(OEN) dmi_sysfs w83627ehf msr hwmon_vid jc42 snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation soundwire_cadence soundwire_bus iTCO_wdt coretemp kvm_intel intel_pmc_bxt snd_hda_codec kvm snd_hda_core snd_hwdep
Jan 01 13:55:28 workstation kernel: gpio_ich iTCO_vendor_support snd_soc_core irqbypass snd_compress snd_pcm_dmaengine snd_pcm snd_timer snd soundcore e1000e i2c_i801 lpc_ich x38_edac pcspkr acpi_cpufreq configfs fuse ext4 crc16 mbcache jbd2 zfs(POEN) zunicode(POEN) zzstd(OEN) zlua(OEN) zavl(POEN) icp(POEN) raid1 md_mod zcommon(POEN) znvpair(POEN) spl(OEN) sr_mod cdrom sd_mod t10_pi ata_generic hid_generic usbhid nouveau(N) mxm_wmi(N) wmi i2c_algo_bit ttm drm_kms_helper uhci_hcd pata_it8213 ahci syscopyarea sysfillrect libahci sysimgblt fb_sys_fops cec serio_raw rc_core ehci_pci ehci_hcd firewire_ohci drm libata usbcore firewire_core crc_itu_t video button dm_mirror dm_region_hash dm_log sg br_netfilter bridge stp llc dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod
Jan 01 13:55:28 workstation kernel: Supported: No, Proprietary and Unsupported modules are loaded
Jan 01 13:55:28 workstation kernel: ---[ end trace f20b1faf687fb7b5 ]---
I was able to recover the data that I needed from this filesystem by booting FreeBSD 13.0-release from the installation media, importing the existing pool, and copying files with tar --no-acls
to a new filesystem in the existing pool. FreeBSD logs a warning from zfs_zaccess_aces_check
(or something similar; I do not have a copy of the log messages) about an invalid ACL and keeps going without any problems.
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
System information
zfs-2.1.1-0york0~20.04
+zfs-kmod-2.1.1-0york0~20.04
Describe the problem you're observing
I have one file in a ZFS filesystem that cannot be accessed without triggering a kernel BUG (dmesg below). As far as I'm aware, all other files on the FS (~35 TB total) are fine. I have tried multiple kernels and multiple ZFS revisions without success.
Once this bug is triggered, all accesses to the parent directory fail until reboot.
The filesystem in question was created in 2016, but was populated via zfs send/receive from a FS that dates back to OpenSolaris. The file in question is from roughly 2013 and probably hasn't been accessed since then. I suspect that there's a bug handling an archaic ACL form that's causing problems here.
Describe how to reproduce the problem
On a freshly booted system:
Any access that touches ACLs on that file causes a kernel BUG. This includes
ls -l
,rm
,chmod
, andsetfacl
. Runningfind -ls
is fine. Once the bug is triggered, all accesses to this directory block, but the rest of the filesystem appears to be fine. This was completely breaking Samba until I disabled ACLs, but it's also breaking backups.Doing a full restore/backup on this filesystem is tricky due to its size. I don't really care about the contents of the file; deleting it would solve my problem, except that
rm
(even as root) also triggers the bug.Here are the ACL-related properties for this filesystem:
Changing
acltype
fromposix
tooff
makes no difference, as expected.Include any warning/errors/backtraces from the system logs
Attempting to run
ls -l
gives this backtrace:Attempting to remove the file using
unlink
results in a similar backtrace: