openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.31k stars 1.71k forks source link

NULL pointer dereference via zfs_setattr_dir #8072

Closed ghost closed 5 years ago

ghost commented 5 years ago

System information

Type Version/Name
Distribution Name Ubuntu (server)
Distribution Version Latest bionic
Linux Kernel 4.15.0-34-generic
Architecture amd64
ZFS Version 0.8.0-rc1
SPL Version 0.8.0-rc1

Describe the problem you're observing

When recursively setting the permissions for a directory via chmod. the process locked up. After inspecting the kernel message buffer a NULL pointer seems to have been accessed in one of the attr structures. Unfortunately I do not have debugging symbols loaded, but will load them after reboot.

The dataset involved is encrypted using native encryption.

redacted  aclinherit            passthrough                           local
redacted  xattr                 on                                    local
redacted  acltype               posixacl                              local
redacted  compression           lz4                                   inherited from ...
redacted  encryption            aes-256-gcm                                 -

Describe how to reproduce the problem

  1. Create a natively encrypted dataset with numerous files and directories, run chgrp to change group ownership, and then chmod recursively to change permissions.

Include any warning/errors/backtraces from the system logs

[2039285.916070] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
[2039285.916601] IP: 0x1
[2039285.917090] PGD 0 P4D 0
[2039285.917593] Oops: 0010 [#1] SMP PTI
[2039285.918112] Modules linked in: cpuid arc4 md4 nls_utf8 cifs ccm fscache nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs uas usb_storage xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 devlink xt_tcpudp ip6t_rpfilter
 ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_na
t_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables bonding iptable_filter binfmt_misc nls_iso8859_1 ipmi_ssif gpio_ich intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel
[2039285.924509]  kvm irqbypass intel_cstate intel_rapl_perf input_leds joydev intel_pch_thermal lpc_ich mei_me mei ioatdma shpchp ipmi_si ipmi_devintf ipmi_msghandler acpi_pad mac_hid sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_tr
ansport_iscsi nfsd 8021q auth_rpcgss nfs_acl garp lockd mrp grace stp sunrpc llc ip_tables x_tables autofs4 zfs(POE) zunicode(POE) zavl(POE) icp(POE) zlua(POE) zcommon(POE) znvpair(POE) spl(OE) btrfs zstd_compress algif_skcipher af_alg dm_crypt raid10 raid456 async_raid6_r
ecov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ast crct10dif_pclmul ttm crc32_pclmul drm_kms_helper ghash_clmulni_intel syscopyarea pcbc sysfillrect aesni_intel ixgbe igb sysimgblt aes_x86_64 fb_sys_fops crypto_simd
[2039285.935119]  i2c_algo_bit glue_helper dca mpt3sas ahci hid_generic mxm_wmi cryptd drm ptp libahci raid_class pps_core scsi_transport_sas mdio usbhid hid wmi
[2039285.938872] CPU: 4 PID: 15320 Comm: chgrp Tainted: P           OE    4.15.0-34-generic #37-Ubuntu
[2039285.940852] Hardware name: REDACTED
[2039285.942885] RIP: 0010:0x1
[2039285.944923] RSP: 0018:ffffb6ffb27ab670 EFLAGS: 00010202
[2039285.947014] RAX: 0000000000000001 RBX: ffffa08920309608 RCX: 0000000000000001
[2039285.949158] RDX: 0000000000000010 RSI: ffffb6ffb27ab68c RDI: ffffb6ffb27ab690
[2039285.951328] RBP: ffffb6ffb27ab6c8 R08: 0000000000000008 R09: 0000000000000001
[2039285.953537] R10: ffffa08681c13c00 R11: 1f525e8c9b98c0a6 R12: 0000000000000000
[2039285.955772] R13: 0000000000000010 R14: ffffa09667ab8000 R15: 0000000000000000
[2039285.958035] FS:  00007f2dff06b540(0000) GS:ffffa09adf300000(0000) knlGS:0000000000000000
[2039285.960348] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2039285.962688] CR2: 0000000000000001 CR3: 0000000e1a4b8005 CR4: 00000000003606e0
[2039285.965080] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[2039285.967489] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[2039285.969905] Call Trace:
[2039285.972451]  ? sa_copy_data+0x5d/0xc0 [zfs]
[2039285.975013]  sa_build_layouts+0x268/0x910 [zfs]
[2039285.977599]  sa_modify_attrs+0x2a2/0x410 [zfs]
[2039285.980200]  sa_attr_op+0x9a/0x3f0 [zfs]
[2039285.982813]  sa_bulk_update_impl+0x69/0x110 [zfs]
[2039285.985455]  sa_bulk_update+0x55/0x90 [zfs]
[2039285.988117]  zfs_setattr_dir+0x2dd/0x4f0 [zfs]
[2039285.990768]  ? dmu_buf_will_dirty+0x16/0x20 [zfs]
[2039285.993452]  ? dnode_rele_and_unlock+0x5e/0xb0 [zfs]
[2039285.996102]  ? spl_kmem_free+0x33/0x40 [spl]
[2039285.998824]  ? dmu_tx_destroy+0xd8/0xf0 [zfs]
[2039286.001602]  zfs_setattr+0x1de2/0x21b0 [zfs]
[2039286.004397]  ? rrw_exit+0x5a/0x150 [zfs]
[2039286.007154]  ? rrm_exit+0x46/0x80 [zfs]
[2039286.009779]  ? _cond_resched+0x19/0x40
[2039286.012410]  ? __kmalloc_node+0x21b/0x2c0
[2039286.015051]  ? spl_kmem_zalloc+0xdc/0x150 [spl]
[2039286.017734]  zpl_setattr+0xf8/0x170 [zfs]
[2039286.020251]  ? evm_inode_setattr+0x1a/0x60
[2039286.025141]  notify_change+0x2eb/0x440
[2039286.027491]  chown_common+0x1c5/0x1e0
[2039286.029769]  SyS_fchownat+0xd7/0xf0
[2039286.031978]  do_syscall_64+0x73/0x130
[2039286.034118]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[2039286.036200] RIP: 0033:0x7f2dfeb7347a
[2039286.038210] RSP: 002b:00007ffff0419a08 EFLAGS: 00000246 ORIG_RAX: 0000000000000104
[2039286.040216] RAX: ffffffffffffffda RBX: 00007ffff0419c50 RCX: 00007f2dfeb7347a
[2039286.042175] RDX: 00000000ffffffff RSI: 000055a4a8bd3a08 RDI: 0000000000000004
[2039286.044078] RBP: 000055a4a8bd3900 R08: 0000000000000100 R09: 00000000ffffffff
[2039286.045922] R10: 00000000000003f8 R11: 0000000000000246 R12: 000055a4a8bd3978
[2039286.047708] R13: 00000000ffffffff R14: 0000000000000004 R15: 00000000ffffff01
[2039286.049428] Code:  Bad RIP value.
[2039286.051115] RIP: 0x1 RSP: ffffb6ffb27ab670
[2039286.052756] CR2: 0000000000000001
[2039286.054364] ---[ end trace 193f863958bc7bd5 ]---
ghost commented 5 years ago

Another note: user-triggerable NULL ptr dereferences, under the right circumstances, are exploitable to execute arbitrary code in ring0. The zeropage "protection" does not protect against arbitrarily large values of N in 0+N, meaning if you can influence the shift, you can bypass it. It's pretty much a gimmick more than a real mitigation, though it works for the simplest of cases or cases where the vulnerability does not involve accessing memory where 0/[user influenced deref of NULL]+[user controlled offset] makes it possible to go well past the initial measly 65k default setting.

Someone who does not understand this basic concept is too much of a moron to be allowed to touch anything near ring0 code that can be accessed/reached by an userland process. Alas, the Linux Code of Conduct seems to have left out 'stupidity' as a disqualifying affliction.

dweeezil commented 5 years ago

This code path was introduced by the project quota work to propagate attribute changes into the contents of xattr directories. @vogelfreiheit I suspect this problem would not occur of you were using xattr=sa which is highly recommended for workloads involving lots of xattrs. That said, this certainly looks like a bug that was introduced with the project quota work so I'm going to tag @Nasf-Fan here to get their take on it.

ghost commented 5 years ago

Nice, note that my comment regarding the security impact of these (under the right circumstances), was not really addressing the ZoL team... but the Linux kernel and KSPP developers who get mitigations merged that only provide a false sense of security (sometimes ripping off third-party inventions and code).

I can retest with xattr=sa, since it is possible to reset the option after creating the dataset (and unfortunately this happened in a production system).

I enabled debug symbols for that particular kernel so if it happens again I hope to have a stacktrace and maybe some useful data.

This also happened in a system where encrypted datasets are loaded and unloaded somewhat regularly.

Nasf-Fan commented 5 years ago

@vogelfreiheit Sorry for the delay respond because of other busy works. I have tried to reproduce the trouble as your offered method, but have not hit the kernel Oops yet. I may missed some conditions. I am not sure whether it will be sensitive with the kernel version, mine is 3.10.0-154.

On the other hand, I made some static code analysis according to above kernel stack trace. I introduced zfs_setattr_dir() that is used for resolving one of the existing ZFS bug (https://github.com/zfsonlinux/zfs/issues/6957). The logic is relatively simple: it scans the directory EA objects and changes the UID/GID/ProjectID (as required) via calling the existing sa_bulk_update() which logic is not changed by the project quota patch. Then why the zfs_setattr_dir() caused the subsequent NULL-pointer during "SA_COPY_DATA()", there are two possible cases:

1) The input parameters (\@attrs or \@count) of sa_bulk_update() is not properly set. Or

2) There are some buggy logic inside sa_bulk_update() that never been triggered before because we did not change the directory EA object's UID/GID/ProjectID.

For the case 1), related logic is quite independent and simple, I double checked, no result. Currently, I more suspect the case 2), but I need more clews to locate the root cause. So if you could offer some simple test scripts that you can reproduce the issue on your site, that will be quite helpful. Thanks!

ghost commented 5 years ago

I will try to see if I can reproduce the issue, but I remember I was mass-chmod'ing and chown'ing a large tree. Rather not repro this on that system so I can try to setup a test VM.

Kernel is the standard Ubuntu Server one on amd64 from Bionic.

ThatDevopsGuy commented 5 years ago

Related issue: If the chgrp fails, subsequent mounts will also fail. I'm not sure how many files exist in the directory where I experienced the issue, but not many (i.e. < 100). A few chmod, chgrp, or chown commands were executed successfully (on other, larger directories), but there was just one directory which caused this issue.

Any ideas on how to salvage the data set? I'd rather not lose ~7 TB if I can avoid it. The scrub came back clean, so actual data seems intact.

FWIW: I'm on Arch, 4.19.12-arch1-1-ARCH, ZFS v0.8.0-rc2_40_g00f198de6.

[   71.319674] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
[   71.319679] PGD 0 P4D 0 
[   71.319684] Oops: 0010 [#1] PREEMPT SMP PTI
[   71.319688] CPU: 7 PID: 1938 Comm: mount.zfs Tainted: P           OE     4.19.12-arch1-1-ARCH #1
[   71.319690] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z87 Extreme9/ac, BIOS P2.30 07/17/2014
[   71.319693] RIP: 0010:0x1
[   71.319698] Code: Bad RIP value.
[   71.319700] RSP: 0018:ffff9584a056f278 EFLAGS: 00010202
[   71.319702] RAX: 0000000000000000 RBX: ffff8c1d88548008 RCX: 0000000000000001
[   71.319704] RDX: 0000000000000010 RSI: ffff9584a056f28c RDI: ffff9584a056f290
[   71.319706] RBP: 0000000000000000 R08: 0000000000000008 R09: 0000000000000000
[   71.319708] R10: ffffb8186018e100 R11: 0000000000000001 R12: ffff8c1d99d46920
[   71.319710] R13: ffff8c1d89070800 R14: 0000000000000001 R15: 0000000000000010
[   71.319712] FS:  00007f8c7a13b7c0(0000) GS:ffff8c1ddebc0000(0000) knlGS:0000000000000000
[   71.319714] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   71.319716] CR2: ffffffffffffffd7 CR3: 0000000813bce005 CR4: 00000000001606e0
[   71.319718] Call Trace:
[   71.319790]  ? sa_copy_data+0x57/0xd0 [zfs]
[   71.319841]  ? sa_build_layouts+0x25a/0x8f0 [zfs]
[   71.319890]  ? sa_modify_attrs+0x299/0x3f0 [zfs]
[   71.319939]  ? sa_attr_op+0x94/0x3d0 [zfs]
[   71.319988]  ? sa_bulk_update_impl+0x62/0x100 [zfs]
[   71.320036]  ? sa_bulk_update+0x50/0x80 [zfs]
[   71.320088]  ? zfs_setattr_dir+0x2b5/0x490 [zfs]
[   71.320096]  ? preempt_count_add+0x79/0xb0
[   71.320099]  ? preempt_count_add+0x79/0xb0
[   71.320103]  ? _raw_spin_lock+0x13/0x40
[   71.320105]  ? _raw_spin_unlock+0x16/0x30
[   71.320145]  ? dnode_rele_and_unlock+0x5b/0xb0 [zfs]
[   71.320197]  ? zfs_setattr+0x1d8c/0x2280 [zfs]
[   71.320203]  ? preempt_count_add+0x79/0xb0
[   71.320212]  ? spl_kmem_cache_free+0x28/0x1a0 [spl]
[   71.320216]  ? __kmalloc_node+0x1fd/0x2b0
[   71.320222]  ? spl_kmem_zalloc+0xd6/0x100 [spl]
[   71.320273]  ? zil_replay_error.isra.11+0xa0/0xa0 [zfs]
[   71.320323]  ? zfs_replay_setattr+0x173/0x210 [zfs]
[   71.320355]  ? aggsum_add+0x180/0x1a0 [zfs]
[   71.320409]  ? zil_replay_log_record+0xab/0x1a0 [zfs]
[   71.320461]  ? zil_parse+0x64b/0x940 [zfs]
[   71.320513]  ? zil_set_logbias+0x10/0x10 [zfs]
[   71.320564]  ? zil_replay_error.isra.11+0xa0/0xa0 [zfs]
[   71.320619]  ? zil_replay+0xb7/0x120 [zfs]
[   71.320672]  ? zfsvfs_setup+0x184/0x1a0 [zfs]
[   71.320724]  ? zfs_domount+0x382/0x3f0 [zfs]
[   71.320776]  ? zpl_mount+0x13d/0x180 [zfs]
[   71.320782]  ? mount_fs+0x3b/0x167
[   71.320786]  ? vfs_kern_mount.part.11+0x54/0x110
[   71.320790]  ? do_mount+0x1fd/0xc80
[   71.320795]  ? _copy_from_user+0x37/0x60
[   71.320798]  ? kmem_cache_alloc_trace+0x176/0x1d0
[   71.320801]  ? copy_mount_options+0x28/0x210
[   71.320804]  ? ksys_mount+0xba/0xd0
[   71.320808]  ? __x64_sys_mount+0x21/0x30
[   71.320812]  ? do_syscall_64+0x5b/0x170
[   71.320816]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   71.320819] Modules linked in: fuse intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel zfs(POE) kvm nvidia_drm(POE) nvidia_modeset(POE) zunicode(POE) irqbypass zlua(POE) intel_cstate snd_hda_codec_hdmi ofpart cmdlinepart intel_spi_platform intel_spi spi_nor mtd iTCO_wdt iTCO_vendor_support nvidia(POE) mxm_wmi intel_uncore intel_rapl_perf snd_hda_codec_realtek pcspkr i2c_i801 snd_hda_codec_generic joydev drm_kms_helper mousedev snd_hda_intel input_leds snd_hda_codec drm igb snd_hda_core snd_hwdep snd_pcm i2c_algo_bit agpgart dca ipmi_devintf ipmi_msghandler syscopyarea sysfillrect sysimgblt snd_timer fb_sys_fops e1000e mei_me snd mei soundcore lpc_ich evdev wmi pcc_cpufreq mac_hid zcommon(POE) znvpair(POE) zavl(POE) icp(POE) spl(OE) sg crypto_user ip_tables x_tables ext4 crc32c_generic
[   71.320869]  crc16 mbcache jbd2 fscrypto algif_skcipher af_alg dm_crypt dm_mod uas usb_storage hid_generic sd_mod usbhid hid crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc ahci libahci xhci_pci aesni_intel libata aes_x86_64 crypto_simd xhci_hcd cryptd glue_helper scsi_mod ehci_pci ehci_hcd
[   71.320892] CR2: 0000000000000001
[   71.320895] ---[ end trace 4747c1cd1536c1a3 ]---
[   71.319674] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
[   71.319679] PGD 0 P4D 0 
[   71.319684] Oops: 0010 [#1] PREEMPT SMP PTI
[   71.319688] CPU: 7 PID: 1938 Comm: mount.zfs Tainted: P           OE     4.19.12-arch1-1-ARCH #1
[   71.319690] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z87 Extreme9/ac, BIOS P2.30 07/17/2014
[   71.319693] RIP: 0010:0x1
[   71.319698] Code: Bad RIP value.
[   71.319700] RSP: 0018:ffff9584a056f278 EFLAGS: 00010202
[   71.319702] RAX: 0000000000000000 RBX: ffff8c1d88548008 RCX: 0000000000000001
[   71.319704] RDX: 0000000000000010 RSI: ffff9584a056f28c RDI: ffff9584a056f290
[   71.319706] RBP: 0000000000000000 R08: 0000000000000008 R09: 0000000000000000
[   71.319708] R10: ffffb8186018e100 R11: 0000000000000001 R12: ffff8c1d99d46920
[   71.319710] R13: ffff8c1d89070800 R14: 0000000000000001 R15: 0000000000000010
[   71.319712] FS:  00007f8c7a13b7c0(0000) GS:ffff8c1ddebc0000(0000) knlGS:0000000000000000
[   71.319714] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   71.319716] CR2: ffffffffffffffd7 CR3: 0000000813bce005 CR4: 00000000001606e0
[   71.319718] Call Trace:
[   71.319790]  ? sa_copy_data+0x57/0xd0 [zfs]
[   71.319841]  ? sa_build_layouts+0x25a/0x8f0 [zfs]
[   71.319890]  ? sa_modify_attrs+0x299/0x3f0 [zfs]
[   71.319939]  ? sa_attr_op+0x94/0x3d0 [zfs]
[   71.319988]  ? sa_bulk_update_impl+0x62/0x100 [zfs]
[   71.320036]  ? sa_bulk_update+0x50/0x80 [zfs]
[   71.320088]  ? zfs_setattr_dir+0x2b5/0x490 [zfs]
[   71.320096]  ? preempt_count_add+0x79/0xb0
[   71.320099]  ? preempt_count_add+0x79/0xb0
[   71.320103]  ? _raw_spin_lock+0x13/0x40
[   71.320105]  ? _raw_spin_unlock+0x16/0x30
[   71.320145]  ? dnode_rele_and_unlock+0x5b/0xb0 [zfs]
[   71.320197]  ? zfs_setattr+0x1d8c/0x2280 [zfs]
[   71.320203]  ? preempt_count_add+0x79/0xb0
[   71.320212]  ? spl_kmem_cache_free+0x28/0x1a0 [spl]
[   71.320216]  ? __kmalloc_node+0x1fd/0x2b0
[   71.320222]  ? spl_kmem_zalloc+0xd6/0x100 [spl]
[   71.320273]  ? zil_replay_error.isra.11+0xa0/0xa0 [zfs]
[   71.320323]  ? zfs_replay_setattr+0x173/0x210 [zfs]
[   71.320355]  ? aggsum_add+0x180/0x1a0 [zfs]
[   71.320409]  ? zil_replay_log_record+0xab/0x1a0 [zfs]
[   71.320461]  ? zil_parse+0x64b/0x940 [zfs]
[   71.320513]  ? zil_set_logbias+0x10/0x10 [zfs]
[   71.320564]  ? zil_replay_error.isra.11+0xa0/0xa0 [zfs]
[   71.320619]  ? zil_replay+0xb7/0x120 [zfs]
[   71.320672]  ? zfsvfs_setup+0x184/0x1a0 [zfs]
[   71.320724]  ? zfs_domount+0x382/0x3f0 [zfs]
[   71.320776]  ? zpl_mount+0x13d/0x180 [zfs]
[   71.320782]  ? mount_fs+0x3b/0x167
[   71.320786]  ? vfs_kern_mount.part.11+0x54/0x110
[   71.320790]  ? do_mount+0x1fd/0xc80
[   71.320795]  ? _copy_from_user+0x37/0x60
[   71.320798]  ? kmem_cache_alloc_trace+0x176/0x1d0
[   71.320801]  ? copy_mount_options+0x28/0x210
[   71.320804]  ? ksys_mount+0xba/0xd0
[   71.320808]  ? __x64_sys_mount+0x21/0x30
[   71.320812]  ? do_syscall_64+0x5b/0x170
[   71.320816]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   71.320819] Modules linked in: fuse intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel zfs(POE) kvm nvidia_drm(POE) nvidia_modeset(POE) zunicode(POE) irqbypass zlua(POE) intel_cstate snd_hda_codec_hdmi ofpart cmdlinepart intel_spi_platform intel_spi spi_nor mtd iTCO_wdt iTCO_vendor_support nvidia(POE) mxm_wmi intel_uncore intel_rapl_perf snd_hda_codec_realtek pcspkr i2c_i801 snd_hda_codec_generic joydev drm_kms_helper mousedev snd_hda_intel input_leds snd_hda_codec drm igb snd_hda_core snd_hwdep snd_pcm i2c_algo_bit agpgart dca ipmi_devintf ipmi_msghandler syscopyarea sysfillrect sysimgblt snd_timer fb_sys_fops e1000e mei_me snd mei soundcore lpc_ich evdev wmi pcc_cpufreq mac_hid zcommon(POE) znvpair(POE) zavl(POE) icp(POE) spl(OE) sg crypto_user ip_tables x_tables ext4 crc32c_generic
[   71.320869]  crc16 mbcache jbd2 fscrypto algif_skcipher af_alg dm_crypt dm_mod uas usb_storage hid_generic sd_mod usbhid hid crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc ahci libahci xhci_pci aesni_intel libata aes_x86_64 crypto_simd xhci_hcd cryptd glue_helper scsi_mod ehci_pci ehci_hcd
[   71.320892] CR2: 0000000000000001
[   71.320895] ---[ end trace 4747c1cd1536c1a3 ]---
[   71.320898] RIP: 0010:0x1
[   71.320902] Code: Bad RIP value.
[   71.320904] RSP: 0018:ffff9584a056f278 EFLAGS: 00010202
[   71.320906] RAX: 0000000000000000 RBX: ffff8c1d88548008 RCX: 0000000000000001
[   71.320908] RDX: 0000000000000010 RSI: ffff9584a056f28c RDI: ffff9584a056f290
[   71.320910] RBP: 0000000000000000 R08: 0000000000000008 R09: 0000000000000000
[   71.320911] R10: ffffb8186018e100 R11: 0000000000000001 R12: ffff8c1d99d46920
[   71.320913] R13: ffff8c1d89070800 R14: 0000000000000001 R15: 0000000000000010
[   71.320916] FS:  00007f8c7a13b7c0(0000) GS:ffff8c1ddebc0000(0000) knlGS:0000000000000000
[   71.320917] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   71.320919] CR2: ffffffffffffffd7 CR3: 0000000813bce005 CR4: 00000000001606e0
dewi-ny-je commented 5 years ago

I think I experienced the same issue. I reported it on zfs-discuss, but I was not aware of this bug report. I report it here again:

Ubuntu 18.04.2 kernel 4.15.0-46-generic, ZoL 0.8.0-762f9ef3d9d897b5baf7c91d6e8a7bf371a9b02f (3rd of March 2019), 32 GB ECC RAM.

ZFS pool created originally under illumos, still retaining all the settings from illumos, see the dataset properties: https://pastebin.com/VmPGy26Q -> notably, xattr=on, acltype=off

Imported+mounted the pool, issued a "chown -R" on a subdir of a dataset, the pool locked (more or less). I cannot export it successfully, I cannot mount that dataset anymore. The other ones are fine however.

See the kernel messages: https://pastebin.com/ufNX78FR

1) what to do to avoid the issue again? I'm now worried about it. 2) can I recover that dataset? honestly I forgot to try to mount/clone a previous snapshot, I'll do it ASAP 3) is the bug under investigation? I see no news since November 2018

dewi-ny-je commented 5 years ago

I'll try with the OmniOSce install CD, I have a spare SSD in the server.

In the meanwhile, I tried rolling back to the latest snapshot and it froze again.

I can sacrifice another dataset from that pool after copying the data somewhere safe... Are there zdb or other commands you would like me to execute before trying to reproduce the issue on my existing pool? After that I'll try with a new pool as you requested.

dewi-ny-je commented 5 years ago

I don't know why the pool is not imported automatically at boot, but anyway I just noticed that when today I issued the zpool import I got another call trace. I'm on mobile now so I cannot easily post it, but I'll do it tomorrow.

lundman commented 5 years ago

We've had this issue from time to time: https://github.com/openzfsonosx/zfs/blob/master/module/zfs/sa.c#L771 but unfortunately not found out what triggers it.

dewi-ny-je commented 5 years ago

Besides the stack trace I already posted, I found another one before it and another one after it.

All three are here (the second one is the same as the one above): https://pastebin.com/5w8dzCQm

They are each of a different length.

I'll get ready for the next steps.

jpfingstmann commented 5 years ago

I closed the issue I had posted, I just imported after reboot and the import took longer than normal, but then completed. I then tried mounting the encrypted dataset and got the i/o error again.

Also, it might be worth noting, the pool was originally created on MacOS (as was the encrypted dataset). My Macbook is out of commission, so I moved the drives to my Linux system (which required that I use the 0.8 branch due to features enabled, including dataset encryption).

jpfingstmann commented 5 years ago

I patched it (I'm not a coder, but with some help from someone in #zfsonlinux I was able to replace printk with dprintk to get it to compile) and upon first mount, it hung with lots of disk activity and then i/o errors (disk read errors) before suspending i/o (separate problem, I was trying on USB3, which has not always been the most stable on this system, so that was my bad - I rebooted and tried from USB2, started a scrub to make sure the chksum errors I saw before were just read errors from the USB issue (and after letting it run a while and getting 0 errors back, am convinced the data itself is actually fine). I then loaded the key and tried mounting and immediately got the "Cannot mount 'tank0/crypt': Input/Output error back.

dmesg error:

[  374.551204] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
[  374.551207] PGD 0 P4D 0 
[  374.551209] Oops: 0010 [#1] SMP PTI
[  374.551211] CPU: 2 PID: 35021 Comm: mount.zfs Tainted: P           OE     4.18.0-16-generic #17-Ubuntu
[  374.551212] Hardware name: Gigabyte Technology Co., Ltd. Z97X-SLI/Z97X-SLI-CF, BIOS F9 09/18/2015
[  374.551214] RIP: 0010:0x1
[  374.551215] Code: Bad RIP value.
[  374.551219] RSP: 0018:ffffa32f99fd7000 EFLAGS: 00010202
[  374.551220] RAX: 0000000000000001 RBX: ffff96a476aaceb8 RCX: 0000000000000001
[  374.551221] RDX: 0000000000000008 RSI: ffffa32f99fd701c RDI: ffffa32f99fd7020
[  374.551222] RBP: ffffa32f99fd7058 R08: 0000000000000008 R09: 0000000000000001
[  374.551223] R10: ffff96a52f861900 R11: ffff96a4fe888000 R12: 0000000000000000
[  374.551224] R13: ffff96a4e04a4980 R14: ffff96a4f45e5800 R15: 0000000000000008
[  374.551225] FS:  00007ff7c48c27c0(0000) GS:ffff96a5bec80000(0000) knlGS:0000000000000000
[  374.551226] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  374.551227] CR2: ffffffffffffffd7 CR3: 000000076ca1c002 CR4: 00000000001606e0
[  374.551228] Call Trace:
[  374.551272]  ? sa_copy_data+0x5d/0xc0 [zfs]
[  374.551277]  ? spl_kmem_alloc+0xe9/0x130 [spl]
[  374.551304]  sa_build_layouts+0x262/0x910 [zfs]
[  374.551308]  ? spl_kmem_zalloc+0xdf/0x130 [spl]
[  374.551333]  sa_modify_attrs+0x29f/0x410 [zfs]
[  374.551358]  sa_attr_op+0x289/0x3f0 [zfs]
[  374.551383]  sa_bulk_update_impl+0x65/0x100 [zfs]
[  374.551408]  sa_bulk_update+0x55/0x90 [zfs]
[  374.551434]  zfs_setattr_dir+0x2da/0x500 [zfs]
[  374.551458]  ? dmu_buf_will_dirty+0x16/0x20 [zfs]
[  374.551462]  ? spl_kmem_free+0x33/0x40 [spl]
[  374.551481]  ? dmu_tx_destroy+0xd8/0xf0 [zfs]
[  374.551507]  zfs_setattr+0x1c9d/0x21d0 [zfs]
[  374.551511]  ? _cond_resched+0x19/0x30
[  374.551514]  ? __kmalloc_node+0x204/0x2c0
[  374.551517]  ? spl_kmem_zalloc+0xdf/0x130 [spl]
[  374.551520]  ? spl_kmem_zalloc+0xdf/0x130 [spl]
[  374.551545]  ? zfs_fuid_info_alloc+0x21/0x60 [zfs]
[  374.551571]  zfs_replay_setattr+0x16f/0x210 [zfs]
[  374.551597]  zil_replay_log_record+0xb2/0x1a0 [zfs]
[  374.551622]  ? zil_replay_error.isra.21+0xa0/0xa0 [zfs]
[  374.551647]  zil_parse+0x65b/0x8f0 [zfs]
[  374.551672]  ? zil_replay_error.isra.21+0xa0/0xa0 [zfs]
[  374.551697]  ? zil_set_logbias+0x20/0x20 [zfs]
[  374.551723]  zil_replay+0xbc/0x120 [zfs]
[  374.551749]  zfsvfs_setup+0x165/0x200 [zfs]
[  374.551776]  zfs_domount+0x379/0x3e0 [zfs]
[  374.551801]  zpl_mount+0x142/0x180 [zfs]
[  374.551804]  mount_fs+0x3d/0x155
[  374.551806]  vfs_kern_mount.part.36+0x5d/0x110
[  374.551808]  do_mount+0x213/0xcf0
[  374.551809]  ? copy_mount_options+0x2c/0x220
[  374.551811]  ksys_mount+0xbf/0xe0
[  374.551813]  __x64_sys_mount+0x25/0x30
[  374.551816]  do_syscall_64+0x5a/0x110
[  374.551818]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  374.551819] RIP: 0033:0x7ff7c50fe90a
[  374.551820] Code: 48 8b 0d 89 85 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 56 85 0c 00 f7 d8 64 89 01 48 
[  374.551838] RSP: 002b:00007ffecbc61188 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[  374.551839] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007ff7c50fe90a
[  374.551840] RDX: 000055975fe5311b RSI: 00007ffecbc652d0 RDI: 00007ffecbc66867
[  374.551841] RBP: 0000000000000005 R08: 00007ffecbc622d0 R09: 0000000000000000
[  374.551842] R10: 0000000001000000 R11: 0000000000000246 R12: 000055976063a5d0
[  374.551843] R13: 00007ffecbc612d0 R14: 0000000000000000 R15: 00007ffecbc622d0
[  374.551844] Modules linked in: pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) nfsv3 nfs_acl nfs lockd grace fscache xt_nat xt_tcpudp veth vxlan ip6_udp_tunnel udp_tunnel iptable_mangle xt_mark xt_conntrack ipt_MASQUERADE nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_addrtype bpfilter overlay aufs zfs(POE) zunicode(POE) zavl(POE) icp(POE) zlua(POE) binfmt_misc snd_hda_codec_hdmi zcommon(POE) znvpair(POE) spl(OE) mxm_wmi intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass intel_cstate snd_soc_rt5640 snd_soc_rl6231 snd_hda_intel snd_soc_core snd_hda_codec snd_compress snd_hda_core ac97_bus snd_pcm_dmaengine snd_hwdep input_leds intel_rapl_perf snd_seq_midi snd_pcm snd_seq_midi_event mei_me snd_rawmidi
[  374.551871]  mei ie31200_edac snd_seq snd_seq_device snd_timer snd soundcore wmi nvidia_uvm(POE) mac_hid acpi_pad sch_fq_codel iptable_filter ip6table_filter ip6_tables br_netfilter bridge stp llc arp_tables parport_pc ppdev sunrpc lp parport ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq libcrc32c algif_skcipher af_alg dm_crypt hid_generic usbhid uas hid usb_storage nvidia_drm(POE) nvidia_modeset(POE) crct10dif_pclmul gpio_ich crc32_pclmul ghash_clmulni_intel nvidia(POE) pcbc drm_kms_helper syscopyarea sysfillrect aesni_intel sysimgblt fb_sys_fops drm aes_x86_64 crypto_simd cryptd glue_helper e1000e ahci ipmi_devintf libahci lpc_ich ipmi_msghandler video
[  374.551898] CR2: 0000000000000001
[  374.551899] ---[ end trace 04c2390c6624acf4 ]---
[  374.551900] RIP: 0010:0x1
[  374.551901] Code: Bad RIP value.
[  374.551903] RSP: 0018:ffffa32f99fd7000 EFLAGS: 00010202
[  374.551905] RAX: 0000000000000001 RBX: ffff96a476aaceb8 RCX: 0000000000000001
[  374.551906] RDX: 0000000000000008 RSI: ffffa32f99fd701c RDI: ffffa32f99fd7020
[  374.551907] RBP: ffffa32f99fd7058 R08: 0000000000000008 R09: 0000000000000001
[  374.551908] R10: ffff96a52f861900 R11: ffff96a4fe888000 R12: 0000000000000000
[  374.551908] R13: ffff96a4e04a4980 R14: ffff96a4f45e5800 R15: 0000000000000008
[  374.551910] FS:  00007ff7c48c27c0(0000) GS:ffff96a5bec80000(0000) knlGS:0000000000000000
[  374.551911] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  374.551912] CR2: ffffffffffffffd7 CR3: 000000076ca1c002 CR4: 00000000001606e0

My patched sa.c file: https://paste.ubuntu.com/p/DWPssVzbJq/

Hopefully this helps someone figure this out. For my next test, I might see about trying it on ZFS on OSX (if I can get my Macbook resurrected). The latest ZFS on OSX release was lacking some features the zpool upgrade enabled, but I was able to mount read-only, I loaded the key and tried mounting the encrypted dataset and got the error that it cannot mount i/o error. I know how to compile things on Linux just fine, but on OSX I'm much more lost and that's not something I care to try to tackle right at this time. If there's anything else on the Linux side I can do to help troubleshoot this, I'm completely open to doing so. The data is NOT production or that important, it'd be a shame to lose it, but not worth crying over.

dewi-ny-je commented 5 years ago

I can reliably reproduce the issue on my pool simply by booting with the OmniOSce '028 install CD and by rolling back to the latest snapshot, but trying to replicate it on a new pool was not successful.

These are the steps: https://pastebin.com/S7Qt9eaA

Let me know if I can help further. I'll keep the old pool for a while longer while transferring everything to a new, ZoL-created pool.

jpfingstmann commented 5 years ago

I was able to do a zfs send / recv (in my case to a file and import into my FreeBSD zpool) and had no issues accessing the data that way. It looks like the cause is the fact that I created the pool on OSX originally and moved it to Linux (based on what everyone is saying). I'll move the rest of my data off and recreate under Linux. That's going to take some time (USB2, although I might look at temporarily hooking it up to my desktop via MLSATA & dock as I have one laying around). I'm happy to test with my setup while it's still around, should be for a few days, at leas.

dewi-ny-je commented 5 years ago

I think I used OmniOS r151002 dated 2012, according to the date I get from "zpool history". Or maybe OpenIndiana.

dewi-ny-je commented 5 years ago

If it can be useful, I could try a more radical approach: I can leave in the problematic dataset(s) a very limited number of files, then add a USB stick as stripe, and with device_removal remove the original HDDs, leaving everything in a much smaller USB stick. If the issue is still reproducible, I can physically send by mail the USB stick (or I electronically its DD image) to the developer(s).

Since it requires quite some work with several reboots to verify the reproducibility of the bug, I'd like to know in advance if you can make use of the resulting image/DD stick.

dewi-ny-je commented 5 years ago

@kpande @ahrens I can do that. It will be two files, since I have a stripe, but no big change. I'll wait for the invitation/private message and in the meanwhile I will prepare the files as requested.

I completed the data transfer to the new ZoL-created zpool, therefore the old problematic one can be stripped to remove sensitive data and to reduce its size to a minimal size. I'll perform the stripping under OmniOSce.

dewi-ny-je commented 5 years ago

I actually did more than asked: I isolated the file that causes the bug, it was simple: for i in *; do echo $i; chown olaf:olaf $i; done.

I attach the compressed archive with the two files that represent my pool (since it was a striped pool). The files were obtained as I was instructed by @kpande

The archive has been created by using tar with the option -S to support sparse files, and it was then compressed with gzip under Windows (using 7-Zip) to split it into 10MiB chunks. I added manually a second ".gz" extension to have GitHub accept them.

zfs_bug.tar.gz.001.gz zfs_bug.tar.gz.002.gz zfs_bug.tar.gz.003.gz

The pool was purged from all the sensitive data and it contains now only one dataset (tank/home/olaf) with 4 files in the folder /tank/home/olaf/Movies/Line rider.

The file that causes the issue is "Line Rider Extreme.flv".

I will wait about a week for a feedback, in case you need more information, then I'll destroy the pool.

Enjoy the bug hunting!

tcaputi commented 5 years ago

@dewi-ny-je I was not able to reproduce the issue by importing your pool and doing a chown:

root@ubuntu-1604:/tank/home/olaf/Movies/Line rider# ll
total 13763
drwx------ 2 501 501       6 Jul 20  2011 ./
drwx------ 3 501 501       4 Mar 16 17:06 ../
-rwx------ 1 501 501 3775372 Oct 17  2006 Line rider 2.flv*
-rwx------ 1 501 501 2191705 Oct 17  2006 Line rider 3.flv*
-rwx------ 1 501 501 6498353 Mar 20  2008 Line Rider Extreme.flv*
-rwx------ 1 501 501 1383127 Oct 17  2006 Line rider.flv*
root@ubuntu-1604:/tank/home/olaf/Movies/Line rider# chown tom:tom Line\ Rider\ Extreme.flv 
root@ubuntu-1604:/tank/home/olaf/Movies/Line rider# 

Did you do anything different?

dewi-ny-je commented 5 years ago

@tcaputi I used the command I pasted, therefore I would say you did it right.

You can see my configuration in my first post: https://github.com/zfsonlinux/zfs/issues/8072#issuecomment-470035846

It's strange because I could reproduce it reliably. What further information can I provide you?

dewi-ny-je commented 5 years ago

As I described above in https://github.com/zfsonlinux/zfs/issues/8072#issuecomment-470035846 once the dataset experiences said error, the dataset cannot be mounted anymore.

Later I will "freeze" my pool again and then I will upload here the "frozen" pool: maybe it will be possible to investigate further.

After doing that, I will update ZoL to the latest revision and I will try again.

tcaputi commented 5 years ago

@dewi-ny-je Do you know if the issue happens consistently? And is it always on the same file every time? This might be a race condition that requires repeated chowning / metadata changing to work.

dewi-ny-je commented 5 years ago

I'm not sure...

The issue happened every time I tried on the whole "Movies" folder (I think I tried two or three times), but I cannot say about which file was causing it each time, since I run for i in *; do echo $i; chown olaf:olaf $i; done only once (also, as you can see, the script has to be run folder by folder). When I found out which file was problematic I immediately purged the Movies folder from the remaining sub-folders, to slim down the pool for you. for info: when I tried the chown -R or the script on a folder by folder basis, nothing else was accessing the pool.

The issue happened again to the "Line rider" folder when I tried the same script in it after I purged the rest of the dataset.

The issue did NOT happen when I chown'ed -R another two whole datasets, both with about as many files as the whole "Movies" directory (before the purge). I tried these datasets only once each.

So the procedure I'll try later will be

I see you used 16.04, but unfortunately I cannot perform the test on a different Ubuntu release, this would take too much time, since it involves using the live CD and compiling ZoL 0.8.0.

dewi-ny-je commented 5 years ago

Well, it was a simple test. It worked immediately, no more locks.

I have no more ideas :(

tcaputi commented 5 years ago

Maybe its a race condition. Perhaps your script produced the issue because it was doing a lot of chowns at once. Can you try running it in a loop again?

dewi-ny-je commented 5 years ago

I haven't deleted any file from the folder "Line rider" that caused the lock before uploading the pool here: if you run for i in *; do echo $i; chown olaf:olaf "$i"; done you will in fact reproduce my test conditions, also because I was running the script folder by folder, without sub-folders (as you can get from the syntax I used).

For info: after copying the content of the problematic dataset (tank/home/olaf) to my new, ZoL-created pool, I run chown olaf:olaf -R /tank/home/olaf on the whole dataset but the issue did not show up, even if it was thousands of files, not less than 10 as in the folder "Line rider".

However, if you think it can be of help, I will try for i in *; do echo $i; chown olaf:olaf "$i"; done continuously on that small folder while changing UID/GID to ensure changes to disk until I get the issue again.

behlendorf commented 5 years ago

Unfortunately, it sounds like we're having a really difficult time reproducing this issue. I've removed this issue from holding up the 0.8 tag, but we of course still need to identify the root cause here.

dewi-ny-je commented 5 years ago

I just tried:

for perm in {500..2000}; do echo $perm; for i in *; do echo $i; chown $perm:$perm "$i"; done; done

Run multiple times, no luck.

Further ideas? As in the past, I'll give you a week for further suggestions, then I'll get rid of the old pool.

tcaputi commented 5 years ago

Is this on the new software? maybe try going back to the old one?

dewi-ny-je commented 5 years ago

I'm still on ZoL 0.8.0-762f9ef3d9d897b5baf7c91d6e8a7bf371a9b02f (3rd of March 2019).

aarononeal commented 5 years ago

I've encountered the same problem (see #8597). Does not appear to be a race condition as I can reproduce it consistently on the same files every time. I can rsync -aAX a bad file and delete it after the copy to get a little further until I hit another problematic file. I guess I'm going to have to duplicate the dataset and lose my snapshots.

tcaputi commented 5 years ago

@aarononeal Is there a way you could get us a reproducer? The big reason this issue hasn't been fixed yet is because none of us have been able to determine what causes the issue.

aarononeal commented 5 years ago

@tcaputi I will try to export a truncated dataset tonight and see if I can get it reproducing under that.

If these details help, it was originally a FreeBSD (FreeNAS) dataset on a 2 drive mirror pool, xattrs were added by MacOS (over AFP or SMB), and then I did a send/recv of the earliest snapshot to a new Linux encrypted dataset in a new 2 drive mirror pool, followed by incrementals up to a new latest snap. I’m now doing that again by cloning that Linux dataset to a 6 drive raidz2. Decided to update permissions and ownership in the process and hit the issue.

dewi-ny-je commented 5 years ago

Except for the send/recv, quite similar to my case.

aarononeal commented 5 years ago

@tcaputi Here's a dataset that reproduces the issue.

Repro steps:

  1. Setup an encrypted dataset under pool/encrypt
  2. Recv the repro dataset under the encrypted one: gunzip -c -d repro.gz | zfs recv pool/encrypt/repro
  3. cd /mnt/pool/encrypt/repro
  4. Try to change the owner: find . \! -user nobody -printf '%p\n' -exec chown nobody {} \;

You should see:

./reference-materials/unity-packages/Terrain/Free Terrain 3.rar
find: ‘chown’ terminated by signal 9
./reference-materials/unity-packages/Terrain/RiverWater.unitypackage
find: ‘chown’ terminated by signal 9

In my other dataset ./reference-materials/unity-packages/Models/Dungeon_Pack/.DS_Store caused a hang, but didn't on my last run of this.

And dmesg shows:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
[ 2249.167149] PGD 0 P4D 0 
[ 2249.167151] Oops: 0010 [#2] SMP PTI
[ 2249.167153] CPU: 7 PID: 22308 Comm: chown Tainted: P      D    OE     4.18.0-17-generic #18~18.04.1-Ubuntu
[ 2249.167154] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./C226M WS, BIOS P2.20 08/17/2015
[ 2249.167155] RIP: 0010:0x1
[ 2249.167156] Code: Bad RIP value.
[ 2249.167159] RSP: 0018:ffffbc90556db660 EFLAGS: 00010202
[ 2249.167160] RAX: 0000000000000001 RBX: ffffbc905d764008 RCX: 0000000000000001
[ 2249.167161] RDX: 0000000000000010 RSI: ffffbc90556db67c RDI: ffffbc90556db680
[ 2249.167162] RBP: ffffbc90556db6b8 R08: 0000000000000008 R09: ffffbc90556db758
[ 2249.167163] R10: ffff99c166ea6e00 R11: ac7c1232983f9fc9 R12: 0000000000000000
[ 2249.167164] R13: 0000000000000010 R14: ffff99c1e33df800 R15: 0000000000000000
[ 2249.167165] FS:  00007f2ddbb5e540(0000) GS:ffff99c1eebc0000(0000) knlGS:0000000000000000
[ 2249.167166] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2249.167167] CR2: ffffffffffffffd7 CR3: 00000007a4a40002 CR4: 00000000001606e0
[ 2249.167168] Call Trace:
[ 2249.167194]  ? sa_copy_data+0x5d/0xc0 [zfs]
[ 2249.167217]  sa_build_layouts+0x268/0x910 [zfs]
[ 2249.167240]  sa_modify_attrs+0x2a2/0x410 [zfs]
[ 2249.167260]  sa_attr_op+0x9a/0x3f0 [zfs]
[ 2249.167280]  sa_bulk_update_impl+0x69/0x110 [zfs]
[ 2249.167299]  sa_bulk_update+0x55/0x90 [zfs]
[ 2249.167324]  zfs_setattr_dir+0x2dd/0x4f0 [zfs]
...
dweeezil commented 5 years ago

@aarononeal I've taken a quick peek at your send stream and here's a couple of notes: First of all, there's no need to use encryption at all. Your steps will reproduce the problem simply by receiving the stream into a normal dataset. Second, the xattrs, with names of the form "org." are "invisible" to Linux because they don't have a recognized prefix (such as "user."). This shouldn't, of course, be any problem insofar as ZFS is concerned, but is worthy of mentioning. I suspect either myself (when I get a bit more time to look at it) or another developer will be able to figure out why this is happening pretty quickly.

aarononeal commented 5 years ago

@dweeezil Thanks! I really appreciate the findings and pointers so far, and I hope this turns out to be easy to troubleshoot with the stream. I'm so curious to learn what I did to get my dataset into this situation. :-)

tcaputi commented 5 years ago

Can confirm this reproduces the issue perfectly. Let me see what I can figure out.

dweeezil commented 5 years ago

See this note for the likely bug description. I'm testing the obvious fix now.

mikoim commented 4 years ago

I encountered this issue at least 82e996c2616af6254002065398de7cfb6929c4fd on Fedora 31 (Linux kernel 5.4.5-300.fc31.x86_64 and vanilla 5.4.7). No native encryption datasets.

[44972.459387] BUG: kernel NULL pointer dereference, address: 000000000000029d
[44972.459390] #PF: supervisor read access in kernel mode
[44972.459391] #PF: error_code(0x0000) - not-present page
[44972.459391] PGD 0 P4D 0 
[44972.459394] Oops: 0000 [#1] SMP NOPTI
[44972.459396] CPU: 5 PID: 526231 Comm: libvirtd Tainted: P           OE     5.4.5-300.fc31.x86_64 #1
[44972.459397] Hardware name: TRIMMED
[44972.459402] RIP: 0010:iput+0x17/0x200
[44972.459403] Code: 24 10 e8 fc fd ff ff e9 a9 fe ff ff e8 b2 70 db ff 66 90 0f 1f 44 00 00 48 85 ff 0f 84 cf 01 00 00 41 55 41 54 55 48 89 fd 53 <f6> 87 98 00 00 00 40 0f 85 ba 01 00 00 4c 8d a7 88 00 00 00 4c 8d
[44972.459404] RSP: 0018:ffffaba605e3f938 EFLAGS: 00010206
[44972.459405] RAX: 0000000000000002 RBX: ffff924ad62cc480 RCX: 0000000000000002
[44972.459406] RDX: 000000000000064b RSI: ffffffffc2748a60 RDI: 0000000000000205
[44972.459407] RBP: 0000000000000205 R08: 0000000000000000 R09: 0000000000000000
[44972.459407] R10: ffffaba605e3f998 R11: 0000008000000000 R12: ffff924ab4c6a580
[44972.459408] R13: ffff924b18672000 R14: 0000000000000002 R15: 000000000000000d
[44972.459410] FS:  00007fd96dff7700(0000) GS:ffff924b1fb40000(0000) knlGS:0000000000000000
[44972.459410] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[44972.459411] CR2: 000000000000029d CR3: 0000001e343a8005 CR4: 00000000003626e0
[44972.459412] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[44972.459413] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[44972.459413] Call Trace:
[44972.459482]  zfs_setattr_dir+0x3fb/0x470 [zfs]
[44972.459507]  ? dbuf_rele_and_unlock+0x134/0x640 [zfs]
[44972.459527]  ? dbuf_read+0x36d/0x510 [zfs]
[44972.459546]  ? dbuf_read+0x36d/0x510 [zfs]
[44972.459550]  ? _cond_resched+0x15/0x30
[44972.459554]  ? __kmalloc_node+0x1ff/0x310
[44972.459556]  ? _cond_resched+0x15/0x30
[44972.459557]  ? kmem_cache_alloc+0x165/0x220
[44972.459559]  ? _cond_resched+0x15/0x30
[44972.459560]  ? mutex_lock+0xe/0x30
[44972.459580]  ? dmu_buf_will_dirty_impl+0x87/0x120 [zfs]
[44972.459612]  ? sa_attr_op+0x29c/0x3c0 [zfs]
[44972.459613]  ? _cond_resched+0x15/0x30
[44972.459637]  ? dnode_rele_and_unlock+0x53/0xb0 [zfs]
[44972.459673]  zfs_setattr+0x1cd7/0x2240 [zfs]
[44972.459709]  ? zpl_xattr_get_dir+0x97/0x140 [zfs]
[44972.459710]  ? _cond_resched+0x15/0x30
[44972.459711]  ? mutex_lock+0xe/0x30
[44972.459742]  ? rrw_exit+0x65/0x150 [zfs]
[44972.459745]  ? security_capable+0x40/0x60
[44972.459746]  ? _cond_resched+0x15/0x30
[44972.459748]  ? __kmalloc_node+0x1ff/0x310
[44972.459782]  zpl_setattr+0xff/0x170 [zfs]
[44972.459784]  notify_change+0x2d9/0x450
[44972.459787]  chown_common.isra.0+0xec/0x1a0
[44972.459789]  do_fchownat+0x8f/0xf0
[44972.459791]  __x64_sys_chown+0x1e/0x30
[44972.459793]  do_syscall_64+0x5b/0x1a0
[44972.459795]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[44972.459796] RIP: 0033:0x7fd9af11979b
[44972.459798] Code: 00 00 00 75 a5 48 89 ef e8 e2 c7 f9 ff eb a4 e8 ab dd 01 00 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 5c 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bd e6 0c 00 f7 d8 64 89 01 48
[44972.459799] RSP: 002b:00007fd96dff6458 EFLAGS: 00000202 ORIG_RAX: 000000000000005c
[44972.459800] RAX: ffffffffffffffda RBX: 000056381a147f70 RCX: 00007fd9af11979b
[44972.459801] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007fd95c118920
[44972.459801] RBP: 00007fd95c2db030 R08: 0000000000000001 R09: 0000000000000000
[44972.459802] R10: 00007fd9af665678 R11: 0000000000000202 R12: 0000000000000001
[44972.459802] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[44972.459804] Modules linked in: vhost_net vhost tap nf_log_ipv4 nf_log_common xt_mac xt_LOG xt_comment xt_CHECKSUM xt_MASQUERADE ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat ebtable_filter ebtables iptable_filter tun ip6t_REJECT nf_reject_ipv6 xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter ip6_tables sunrpc vfat fat zfs(POE) zunicode(POE) zlua(POE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(OE) intel_rapl_msr intel_rapl_common snd_hda_codec_hdmi isst_if_common snd_hda_codec_realtek nfit snd_hda_codec_generic libnvdimm ledtrig_audio nvidia(POE) snd_hda_intel snd_intel_dspcfg x86_pkg_temp_thermal intel_powerclamp iwlmvm snd_hda_codec coretemp drm_kms_helper kvm_intel mac80211 snd_usb_audio snd_hda_core drm kvm snd_usbmidi_lib snd_rawmidi snd_hwdep irqbypass snd_seq libarc4 snd_seq_device intel_cstate ipmi_devintf intel_uncore iwlwifi intel_rapl_perf snd_pcm
[44972.459831]  ipmi_msghandler intel_wmi_thunderbolt wdat_wdt snd_timer i2c_i801 snd mei_me soundcore cfg80211 joydev mc mei ioatdma ip_tables xfs libcrc32c dm_crypt rfkill crct10dif_pclmul crc32_pclmul mxm_wmi crc32c_intel igb e1000e ghash_clmulni_intel nvme nvme_core dca i2c_algo_bit wmi uas usb_storage br_netfilter bridge stp llc fuse
[44972.459846] CR2: 000000000000029d
[44972.459848] ---[ end trace 7954ac2a13293527 ]---
[44972.459849] RIP: 0010:iput+0x17/0x200
[44972.459850] Code: 24 10 e8 fc fd ff ff e9 a9 fe ff ff e8 b2 70 db ff 66 90 0f 1f 44 00 00 48 85 ff 0f 84 cf 01 00 00 41 55 41 54 55 48 89 fd 53 <f6> 87 98 00 00 00 40 0f 85 ba 01 00 00 4c 8d a7 88 00 00 00 4c 8d
[44972.459851] RSP: 0018:ffffaba605e3f938 EFLAGS: 00010206
[44972.459852] RAX: 0000000000000002 RBX: ffff924ad62cc480 RCX: 0000000000000002
[44972.459853] RDX: 000000000000064b RSI: ffffffffc2748a60 RDI: 0000000000000205
[44972.459853] RBP: 0000000000000205 R08: 0000000000000000 R09: 0000000000000000
[44972.459854] R10: ffffaba605e3f998 R11: 0000008000000000 R12: ffff924ab4c6a580
[44972.459855] R13: ffff924b18672000 R14: 0000000000000002 R15: 000000000000000d
[44972.459856] FS:  00007fd96dff7700(0000) GS:ffff924b1fb40000(0000) knlGS:0000000000000000
[44972.459856] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[44972.459857] CR2: 000000000000029d CR3: 0000001e343a8005 CR4: 00000000003626e0
[44972.459858] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[44972.459858] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400