Closed manfromafar closed 1 month ago
Can you try this patch to see if it resolves the issue? https://github.com/klarasystems/zfs/commit/dac46833bc25fe05808d1b118e33d26f8b239d92
Can you try this patch to see if it resolves the issue? KlaraSystems@dac4683
Just tried compiling 2.2.4 with the patch suggested and still kdumps. Only tried with the kmod though didn't try dkms.
System tested on is Rocky 9.4 Full log
[Jun 5 13:24] spl: loading out-of-tree module taints kernel.
[ +0.000108] spl: module verification failed: signature and/or required key missing - tainting kernel
[ +0.022153] zfs: module license 'CDDL' taints kernel.
[ +0.000004] Disabling lock debugging due to kernel taint
[ +1.787175] ZFS: Loaded module v2.2.4-1, ZFS pool version 5000, ZFS filesystem version 5
[ +6.947544] ------------[ cut here ]------------
[ +0.000002] kobject: '(null)' (0000000056fd178d): is not initialized, yet kobject_put() is being called.
[ +0.000010] WARNING: CPU: 1 PID: 1701 at lib/kobject.c:758 kobject_put+0x3e/0x60
[ +0.000069] Modules linked in: zfs(POE-) spl(OE) tls nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink vfat fat vmwgfx intel_rapl_msr intel_rapl_common intel_uncore_frequency_common isst_if_mbox_msr isst_if_common nfit libnvdimm drm_ttm_helper rapl ttm vmw_balloon drm_kms_helper pcspkr vmw_vmci syscopyarea sysfillrect sysimgblt fb_sys_fops i2c_piix4 joydev drm fuse xfs libcrc32c sr_mod cdrom ata_generic crct10dif_pclmul sd_mod t10_pi ahci crc32_pclmul sg crc32c_intel libahci vmxnet3 ghash_clmulni_intel ata_piix libata vmw_pvscsi serio_raw dm_mirror dm_region_hash dm_log dm_mod
[ +0.000178] CPU: 1 PID: 1701 Comm: rmmod Kdump: loaded Tainted: P OE ------- --- 5.14.0-427.18.1.el9_4.x86_64 #1
[ +0.000047] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.21100432.B64.2301110304 01/11/2023
[ +0.000027] RIP: 0010:kobject_put+0x3e/0x60
[ +0.000013] Code: ff ff ff ff f0 0f c1 45 38 83 f8 01 74 20 85 c0 7e 25 5d c3 cc cc cc cc 48 8b 37 48 89 fa 48 c7 c7 00 6d 18 a1 e8 22 e3 aa ff <0f> 0b eb cd 48 89 ef 5d e9 35 01 00 00 be 03 00 00 00 5d e9 8a 5e
[ +0.000041] RSP: 0018:ffffae9580a27e58 EFLAGS: 00010286
[ +0.000015] RAX: 0000000000000000 RBX: 000000000000002f RCX: 0000000000000027
[ +0.000018] RDX: 0000000000000027 RSI: ffffffffa18679c0 RDI: ffff9e6637ca0848
[ +0.000017] RBP: ffff9e6506c36840 R08: 80000000ffff85c0 R09: 0000000000ffff10
[ +0.000020] R10: 0000000000000007 R11: 000000000000000f R12: ffffae9580a27f58
[ +0.000019] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ +0.000017] FS: 00007f55ffbf3740(0000) GS:ffff9e6637c80000(0000) knlGS:0000000000000000
[ +0.000032] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.000015] CR2: 000055f2d07c41c8 CR3: 00000001071b4003 CR4: 0000000000770ee0
[ +0.000041] PKRU: 55555554
[ +0.000009] Call Trace:
[ +0.000010] <TASK>
[ +0.000008] ? show_trace_log_lvl+0x1c4/0x2df
[ +0.000016] ? show_trace_log_lvl+0x1c4/0x2df
[ +0.000014] ? zfs_sysfs_fini+0x122/0x1a0 [zfs]
[ +0.000238] ? kobject_put+0x3e/0x60
[ +0.000012] ? __warn+0x81/0x110
[ +0.000012] ? kobject_put+0x3e/0x60
[ +0.000010] ? report_bug+0x10a/0x140
[ +0.000012] ? handle_bug+0x3c/0x70
[ +0.000014] ? exc_invalid_op+0x14/0x70
[ +0.000012] ? asm_exc_invalid_op+0x16/0x20
[ +0.000014] ? kobject_put+0x3e/0x60
[ +0.000010] ? kobject_put+0x3e/0x60
[ +0.000011] zfs_sysfs_fini+0x122/0x1a0 [zfs]
[ +0.000149] openzfs_fini+0x5/0x253 [zfs]
[ +0.000162] __do_sys_delete_module.constprop.0+0x175/0x280
[ +0.000018] ? syscall_trace_enter.constprop.0+0x126/0x1a0
[ +0.000795] do_syscall_64+0x59/0x90
[ +0.000761] ? exit_to_user_mode_prepare+0xb6/0x100
[ +0.000697] ? syscall_exit_to_user_mode+0x22/0x40
[ +0.000730] ? do_syscall_64+0x69/0x90
[ +0.000713] ? syscall_exit_to_user_mode+0x22/0x40
[ +0.000660] ? do_syscall_64+0x69/0x90
[ +0.000700] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ +0.000694] RIP: 0033:0x7f55ff30ee2b
[ +0.000662] Code: 73 01 c3 48 8b 0d f5 af 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c5 af 0e 00 f7 d8 64 89 01 48
[ +0.001409] RSP: 002b:00007fff25e7b258 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[ +0.000703] RAX: ffffffffffffffda RBX: 000055f2d07b97c0 RCX: 00007f55ff30ee2b
[ +0.000646] RDX: 000000000000000a RSI: 0000000000000800 RDI: 000055f2d07b9828
[ +0.000666] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ +0.000628] R10: 00007f55ff39eac0 R11: 0000000000000206 R12: 00007fff25e7b4b0
[ +0.000623] R13: 00007fff25e7c60f R14: 000055f2d07b92a0 R15: 000055f2d07b97c0
[ +0.000608] </TASK>
[ +0.000540] ---[ end trace ae729120c7e9a3b9 ]---
[ +0.000557] ------------[ cut here ]------------
[ +0.000000] refcount_t: underflow; use-after-free.
[ +0.000010] WARNING: CPU: 1 PID: 1701 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x110
[ +0.001544] Modules linked in: zfs(POE-) spl(OE) tls nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink vfat fat vmwgfx intel_rapl_msr intel_rapl_common intel_uncore_frequency_common isst_if_mbox_msr isst_if_common nfit libnvdimm drm_ttm_helper rapl ttm vmw_balloon drm_kms_helper pcspkr vmw_vmci syscopyarea sysfillrect sysimgblt fb_sys_fops i2c_piix4 joydev drm fuse xfs libcrc32c sr_mod cdrom ata_generic crct10dif_pclmul sd_mod t10_pi ahci crc32_pclmul sg crc32c_intel libahci vmxnet3 ghash_clmulni_intel ata_piix libata vmw_pvscsi serio_raw dm_mirror dm_region_hash dm_log dm_mod
[ +0.002926] CPU: 1 PID: 1701 Comm: rmmod Kdump: loaded Tainted: P W OE ------- --- 5.14.0-427.18.1.el9_4.x86_64 #1
[ +0.000644] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.21100432.B64.2301110304 01/11/2023
[ +0.000673] RIP: 0010:refcount_warn_saturate+0xba/0x110
[ +0.000654] Code: 01 01 e8 e9 83 af ff 0f 0b c3 cc cc cc cc 80 3d 83 f2 b1 01 00 75 85 48 c7 c7 98 21 18 a1 c6 05 73 f2 b1 01 01 e8 c6 83 af ff <0f> 0b c3 cc cc cc cc 80 3d 5e f2 b1 01 00 0f 85 5e ff ff ff 48 c7
[ +0.001351] RSP: 0018:ffffae9580a27e60 EFLAGS: 00010282
[ +0.000685] RAX: 0000000000000000 RBX: 000000000000002f RCX: 0000000000000027
[ +0.000707] RDX: 0000000000000027 RSI: ffffffffa18679c0 RDI: ffff9e6637ca0848
[ +0.000720] RBP: ffffffffc125a380 R08: 80000000ffff85f5 R09: ffffae9580a27de8
[ +0.000686] R10: 0000000000000001 R11: 0000000000000028 R12: ffffae9580a27f58
[ +0.000728] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ +0.000761] FS: 00007f55ffbf3740(0000) GS:ffff9e6637c80000(0000) knlGS:0000000000000000
[ +0.000738] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.000738] CR2: 000055f2d07c41c8 CR3: 00000001071b4003 CR4: 0000000000770ee0
[ +0.000771] PKRU: 55555554
[ +0.000766] Call Trace:
[ +0.000735] <TASK>
[ +0.000705] ? show_trace_log_lvl+0x1c4/0x2df
[ +0.000762] ? show_trace_log_lvl+0x1c4/0x2df
[ +0.000740] ? zfs_sysfs_fini+0x122/0x1a0 [zfs]
[ +0.000939] ? refcount_warn_saturate+0xba/0x110
[ +0.000750] ? __warn+0x81/0x110
[ +0.000713] ? refcount_warn_saturate+0xba/0x110
[ +0.000753] ? report_bug+0x10a/0x140
[ +0.000739] ? handle_bug+0x3c/0x70
[ +0.000702] ? exc_invalid_op+0x14/0x70
[ +0.000739] ? asm_exc_invalid_op+0x16/0x20
[ +0.000747] ? refcount_warn_saturate+0xba/0x110
[ +0.000725] zfs_sysfs_fini+0x122/0x1a0 [zfs]
[ +0.000910] openzfs_fini+0x5/0x253 [zfs]
[ +0.000905] __do_sys_delete_module.constprop.0+0x175/0x280
[ +0.000746] ? syscall_trace_enter.constprop.0+0x126/0x1a0
[ +0.000737] do_syscall_64+0x59/0x90
[ +0.000687] ? exit_to_user_mode_prepare+0xb6/0x100
[ +0.000686] ? syscall_exit_to_user_mode+0x22/0x40
[ +0.000684] ? do_syscall_64+0x69/0x90
[ +0.000696] ? syscall_exit_to_user_mode+0x22/0x40
[ +0.000650] ? do_syscall_64+0x69/0x90
[ +0.000693] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ +0.000706] RIP: 0033:0x7f55ff30ee2b
[ +0.000660] Code: 73 01 c3 48 8b 0d f5 af 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c5 af 0e 00 f7 d8 64 89 01 48
[ +0.001422] RSP: 002b:00007fff25e7b258 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[ +0.000744] RAX: ffffffffffffffda RBX: 000055f2d07b97c0 RCX: 00007f55ff30ee2b
[ +0.000693] RDX: 000000000000000a RSI: 0000000000000800 RDI: 000055f2d07b9828
[ +0.000748] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ +0.000703] R10: 00007f55ff39eac0 R11: 0000000000000206 R12: 00007fff25e7b4b0
[ +0.000628] R13: 00007fff25e7c60f R14: 000055f2d07b92a0 R15: 000055f2d07b97c0
[ +0.000643] </TASK>
[ +0.000619] ---[ end trace ae729120c7e9a3ba ]---
[ +0.012487] ZFS: Unloaded module v2.2.4-1
When building from source I followed the build guide https://openzfs.github.io/openzfs-docs/Developer%20Resources/Building%20ZFS.html I used to rpm's instead of doing a make install.
After some trial and error, I was able to get a version of zfs that didn't have issues unloading the module on opensuse 15.5. Using kernel kernel-default-5.14.21-150500.55.44.1.x86_64.rpm I was able to get 2.1.13 installed and there were no issues with loading/unloading/deleting larges amounts of datasets. Untested is doing my actual backups but now that I have a "working" system that will be the next step.
2.2.4 still has issues on unload though using the same kernel as the 2.1.13 test
The issue is also reproducible in my environments when unloading the ZFS module on ZFS 2.2.4, regardless of the kernel version, so it seems like a bug in ZFS 2.2.4. The issue does not occur on ZFS 2.2.3.
https://github.com/openzfs/zfs/commit/db65272aef3d380d2bd1c94907826f2b9ec9205e seems to cause this issue in the 2.2.4 release. VDEV_PROP_RAIDZ_EXPANDING
is defined but was never registered with sysfs, which causes an exception during kobj
release, though it is not a problem with the master branch due to complete raidz expand support there. @tonyhutter, we may need to remove the property from the list in the zfs-2.2.5-staging branch.
@ixhamza thanks, I'll take a look.
This fixes it:
diff --git a/module/os/linux/zfs/zfs_sysfs.c b/module/os/linux/zfs/zfs_sysfs.c
index e2431fe8a..492ab8184 100644
--- a/module/os/linux/zfs/zfs_sysfs.c
+++ b/module/os/linux/zfs/zfs_sysfs.c
@@ -110,8 +110,10 @@ zfs_kobj_fini(zfs_mod_kobj_t *zkobj)
}
/* kobject_put() will call zfs_kobj_release() to release memory */
- kobject_del(&zkobj->zko_kobj);
- kobject_put(&zkobj->zko_kobj);
+ if (zkobj->zko_kobj.name != NULL) {
+ kobject_del(&zkobj->zko_kobj);
+ kobject_put(&zkobj->zko_kobj);
+ }
}
static void
Let me put together a PR against zfs-2.2.5-staging.
@ixhamza would you mind taking a look at the fix: https://github.com/openzfs/zfs/pull/16406
@tonyhutter - Looks good to me. I understand that we don’t have the same issue in the master branch, but I think it could still be beneficial to add it there as well.
This is fixed in zfs-2.2.5. Closing issue.
Ugh. I'm used to upgrading zfs with: service stop / zpool export / modprobe -r / modprobe / import / start. When I tried to upgrade one instance to 2.2.5, it caused a kernel oops, and required a local admin to forcibly power cycle the VM, which was stuck with the module in the "unloading" state, and services offline. It seems like now the only safe way to upgrade off of 2.2.4 is to reboot ?
i suggest this could use greater visibility in the release notes, which currently say: "[2.2.5-only] Make 'rmmod zfs' work after zfs-2.2.4 (#16406)".
I suggest: NOTE: attempting to remove module version 2.2.4 cannot be gracefully removed. To upgrade from 2.2.4, it's recommended to reboot rather than modprobe -r/rmmod.
System information
Describe the problem you're observing
When removing the kernel module a kdump is generated. This appears to happen on both opensuse 15.5 and rocky 9.
Describe how to reproduce the problem
dnf install https://zfsonlinux.org/epel/zfs-release-2-3$(rpm --eval "%{dist}").noarch.rpm
dnf config-manager --enable zfs-testing-kmod
dnf install zfs
reboot
modprobe zfs
rmmod zfs
Include any warning/errors/backtraces from the system logs