openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.5k stars 1.74k forks source link

kdump when removing zfs kmod kernel module 2.2.4 #16249

Closed manfromafar closed 1 month ago

manfromafar commented 4 months ago

System information

Type Version/Name
Distribution Name Rocky Linux
Distribution Version 9.4 (Blue Onyx)
Kernel Version 5.14.0-427.18.1.el9_4.x86_64
Architecture amd64
OpenZFS Version zfs-kmod-2.2.4-3

Describe the problem you're observing

When removing the kernel module a kdump is generated. This appears to happen on both opensuse 15.5 and rocky 9.

Describe how to reproduce the problem

  1. Install the zfs repo dnf install https://zfsonlinux.org/epel/zfs-release-2-3$(rpm --eval "%{dist}").noarch.rpm
  2. Enable the zfs-testing-kmod repository dnf config-manager --enable zfs-testing-kmod
  3. Install zfs dnf install zfs
  4. Reboot the system after install completes reboot
  5. Load the kernel module modprobe zfs
  6. Remove the module rmmod zfs

Include any warning/errors/backtraces from the system logs

[Jun 5 10:25] spl: loading out-of-tree module taints kernel.
[  +0.000136] spl: module verification failed: signature and/or required key missing - tainting kernel
[  +0.026840] zfs: module license 'CDDL' taints kernel.
[  +0.000003] Disabling lock debugging due to kernel taint
[  +1.752854] ZFS: Loaded module v2.2.4-3, ZFS pool version 5000, ZFS filesystem version 5
[ +11.700660] ------------[ cut here ]------------
[  +0.000002] kobject: '(null)' (0000000062752f8e): is not initialized, yet kobject_put() is being called.
[  +0.000011] WARNING: CPU: 1 PID: 1553 at lib/kobject.c:758 kobject_put+0x3e/0x60
[  +0.000068] Modules linked in: zfs(POE-) spl(OE) nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink intel_rapl_msr intel_rapl_common intel_uncore_frequency_common vfat fat isst_if_mbox_msr isst_if_common nfit libnvdimm vmwgfx vmw_balloon rapl drm_ttm_helper ttm pcspkr drm_kms_helper syscopyarea sysfillrect sysimgblt i2c_piix4 vmw_vmci fb_sys_fops joydev drm fuse xfs libcrc32c sr_mod cdrom ata_generic crct10dif_pclmul crc32_pclmul sd_mod crc32c_intel t10_pi sg ahci libahci ata_piix ghash_clmulni_intel vmxnet3 libata vmw_pvscsi serio_raw dm_mirror dm_region_hash dm_log dm_mod
[  +0.000206] CPU: 1 PID: 1553 Comm: rmmod Kdump: loaded Tainted: P           OE     -------  ---  5.14.0-427.18.1.el9_4.x86_64 #1
[  +0.000028] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.21100432.B64.2301110304 01/11/2023
[  +0.000037] RIP: 0010:kobject_put+0x3e/0x60
[  +0.000014] Code: ff ff ff ff f0 0f c1 45 38 83 f8 01 74 20 85 c0 7e 25 5d c3 cc cc cc cc 48 8b 37 48 89 fa 48 c7 c7 00 6d d8 98 e8 22 e3 aa ff <0f> 0b eb cd 48 89 ef 5d e9 35 01 00 00 be 03 00 00 00 5d e9 8a 5e
[  +0.000041] RSP: 0018:ffffbf5700aa3e68 EFLAGS: 00010282
[  +0.000015] RAX: 0000000000000000 RBX: 000000000000002f RCX: 0000000000000027
[  +0.000021] RDX: 0000000000000027 RSI: ffffffff994679c0 RDI: ffff971eb7ca0848
[  +0.000018] RBP: ffff971d826be840 R08: 80000000ffff85cd R09: 0000000000ffff10
[  +0.000017] R10: 0000000000000007 R11: 000000000000000f R12: ffffbf5700aa3f58
[  +0.000020] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  +0.000020] FS:  00007f95c771f740(0000) GS:ffff971eb7c80000(0000) knlGS:0000000000000000
[  +0.000034] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000017] CR2: 00005644c70231c8 CR3: 0000000104ce2005 CR4: 0000000000770ee0
[  +0.000034] PKRU: 55555554
[  +0.000009] Call Trace:
[  +0.000009]  <TASK>
[  +0.000009]  ? show_trace_log_lvl+0x1c4/0x2df
[  +0.000016]  ? show_trace_log_lvl+0x1c4/0x2df
[  +0.000016]  ? zfs_sysfs_fini+0x122/0x1a0 [zfs]
[  +0.000200]  ? kobject_put+0x3e/0x60
[  +0.000012]  ? __warn+0x81/0x110
[  +0.000012]  ? kobject_put+0x3e/0x60
[  +0.000011]  ? report_bug+0x10a/0x140
[  +0.000012]  ? handle_bug+0x3c/0x70
[  +0.000013]  ? exc_invalid_op+0x14/0x70
[  +0.000011]  ? asm_exc_invalid_op+0x16/0x20
[  +0.000014]  ? kobject_put+0x3e/0x60
[  +0.000011]  ? kobject_put+0x3e/0x60
[  +0.000010]  zfs_sysfs_fini+0x122/0x1a0 [zfs]
[  +0.000160]  openzfs_fini+0x5/0x293 [zfs]
[  +0.000167]  __do_sys_delete_module.constprop.0+0x175/0x280
[  +0.000019]  ? syscall_trace_enter.constprop.0+0x126/0x1a0
[  +0.000767]  do_syscall_64+0x59/0x90
[  +0.000786]  ? syscall_exit_work+0x103/0x130
[  +0.000754]  ? syscall_exit_to_user_mode+0x22/0x40
[  +0.000708]  ? do_syscall_64+0x69/0x90
[  +0.000701]  ? exc_page_fault+0x62/0x150
[  +0.000701]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[  +0.000733] RIP: 0033:0x7f95c6f0ee2b
[  +0.000676] Code: 73 01 c3 48 8b 0d f5 af 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c5 af 0e 00 f7 d8 64 89 01 48
[  +0.001444] RSP: 002b:00007ffc2a063b38 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[  +0.000737] RAX: ffffffffffffffda RBX: 00005644c70187c0 RCX: 00007f95c6f0ee2b
[  +0.000675] RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005644c7018828
[  +0.000703] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  +0.000674] R10: 00007f95c6f9eac0 R11: 0000000000000206 R12: 00007ffc2a063d90
[  +0.000608] R13: 00007ffc2a06560f R14: 00005644c70182a0 R15: 00005644c70187c0
[  +0.000620]  </TASK>
[  +0.000584] ---[ end trace 9aa762227aa271f4 ]---
[  +0.000586] ------------[ cut here ]------------
[  +0.000001] refcount_t: underflow; use-after-free.
[  +0.000009] WARNING: CPU: 1 PID: 1553 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x110
[  +0.001669] Modules linked in: zfs(POE-) spl(OE) nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink intel_rapl_msr intel_rapl_common intel_uncore_frequency_common vfat fat isst_if_mbox_msr isst_if_common nfit libnvdimm vmwgfx vmw_balloon rapl drm_ttm_helper ttm pcspkr drm_kms_helper syscopyarea sysfillrect sysimgblt i2c_piix4 vmw_vmci fb_sys_fops joydev drm fuse xfs libcrc32c sr_mod cdrom ata_generic crct10dif_pclmul crc32_pclmul sd_mod crc32c_intel t10_pi sg ahci libahci ata_piix ghash_clmulni_intel vmxnet3 libata vmw_pvscsi serio_raw dm_mirror dm_region_hash dm_log dm_mod
[  +0.002925] CPU: 1 PID: 1553 Comm: rmmod Kdump: loaded Tainted: P        W  OE     -------  ---  5.14.0-427.18.1.el9_4.x86_64 #1
[  +0.000621] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.21100432.B64.2301110304 01/11/2023
[  +0.000652] RIP: 0010:refcount_warn_saturate+0xba/0x110
[  +0.000661] Code: 01 01 e8 e9 83 af ff 0f 0b c3 cc cc cc cc 80 3d 83 f2 b1 01 00 75 85 48 c7 c7 98 21 d8 98 c6 05 73 f2 b1 01 01 e8 c6 83 af ff <0f> 0b c3 cc cc cc cc 80 3d 5e f2 b1 01 00 0f 85 5e ff ff ff 48 c7
[  +0.001370] RSP: 0018:ffffbf5700aa3e70 EFLAGS: 00010286
[  +0.000674] RAX: 0000000000000000 RBX: 000000000000002f RCX: 0000000000000027
[  +0.000679] RDX: 0000000000000027 RSI: ffffffff994679c0 RDI: ffff971eb7ca0848
[  +0.000708] RBP: ffffffffc1118380 R08: 80000000ffff8601 R09: ffffbf5700aa3df8
[  +0.000713] R10: 0000000000000001 R11: 0000000000000028 R12: ffffbf5700aa3f58
[  +0.000679] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  +0.000713] FS:  00007f95c771f740(0000) GS:ffff971eb7c80000(0000) knlGS:0000000000000000
[  +0.000738] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000719] CR2: 00005644c70231c8 CR3: 0000000104ce2005 CR4: 0000000000770ee0
[  +0.000751] PKRU: 55555554
[  +0.000740] Call Trace:
[  +0.000744]  <TASK>
[  +0.001046]  ? show_trace_log_lvl+0x1c4/0x2df
[  +0.000755]  ? show_trace_log_lvl+0x1c4/0x2df
[  +0.000706]  ? zfs_sysfs_fini+0x122/0x1a0 [zfs]
[  +0.000934]  ? refcount_warn_saturate+0xba/0x110
[  +0.000757]  ? __warn+0x81/0x110
[  +0.000737]  ? refcount_warn_saturate+0xba/0x110
[  +0.000734]  ? report_bug+0x10a/0x140
[  +0.000753]  ? handle_bug+0x3c/0x70
[  +0.000742]  ? exc_invalid_op+0x14/0x70
[  +0.000720]  ? asm_exc_invalid_op+0x16/0x20
[  +0.000726]  ? refcount_warn_saturate+0xba/0x110
[  +0.000743]  zfs_sysfs_fini+0x122/0x1a0 [zfs]
[  +0.000913]  openzfs_fini+0x5/0x293 [zfs]
[  +0.000928]  __do_sys_delete_module.constprop.0+0x175/0x280
[  +0.000749]  ? syscall_trace_enter.constprop.0+0x126/0x1a0
[  +0.000689]  do_syscall_64+0x59/0x90
[  +0.000698]  ? syscall_exit_work+0x103/0x130
[  +0.000696]  ? syscall_exit_to_user_mode+0x22/0x40
[  +0.000647]  ? do_syscall_64+0x69/0x90
[  +0.000680]  ? exc_page_fault+0x62/0x150
[  +0.000702]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[  +0.000677] RIP: 0033:0x7f95c6f0ee2b
[  +0.000687] Code: 73 01 c3 48 8b 0d f5 af 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c5 af 0e 00 f7 d8 64 89 01 48
[  +0.001408] RSP: 002b:00007ffc2a063b38 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[  +0.000726] RAX: ffffffffffffffda RBX: 00005644c70187c0 RCX: 00007f95c6f0ee2b
[  +0.000732] RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005644c7018828
[  +0.000735] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  +0.000669] R10: 00007f95c6f9eac0 R11: 0000000000000206 R12: 00007ffc2a063d90
[  +0.000677] R13: 00007ffc2a06560f R14: 00005644c70182a0 R15: 00005644c70187c0
[  +0.000654]  </TASK>
[  +0.000620] ---[ end trace 9aa762227aa271f5 ]---
[  +0.013874] ZFS: Unloaded module v2.2.4-3
allanjude commented 4 months ago

Can you try this patch to see if it resolves the issue? https://github.com/klarasystems/zfs/commit/dac46833bc25fe05808d1b118e33d26f8b239d92

manfromafar commented 4 months ago

Can you try this patch to see if it resolves the issue? KlaraSystems@dac4683

Just tried compiling 2.2.4 with the patch suggested and still kdumps. Only tried with the kmod though didn't try dkms.

System tested on is Rocky 9.4 Full log

[Jun 5 13:24] spl: loading out-of-tree module taints kernel.
[  +0.000108] spl: module verification failed: signature and/or required key missing - tainting kernel
[  +0.022153] zfs: module license 'CDDL' taints kernel.
[  +0.000004] Disabling lock debugging due to kernel taint
[  +1.787175] ZFS: Loaded module v2.2.4-1, ZFS pool version 5000, ZFS filesystem version 5
[  +6.947544] ------------[ cut here ]------------
[  +0.000002] kobject: '(null)' (0000000056fd178d): is not initialized, yet kobject_put() is being called.
[  +0.000010] WARNING: CPU: 1 PID: 1701 at lib/kobject.c:758 kobject_put+0x3e/0x60
[  +0.000069] Modules linked in: zfs(POE-) spl(OE) tls nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink vfat fat vmwgfx intel_rapl_msr intel_rapl_common intel_uncore_frequency_common isst_if_mbox_msr isst_if_common nfit libnvdimm drm_ttm_helper rapl ttm vmw_balloon drm_kms_helper pcspkr vmw_vmci syscopyarea sysfillrect sysimgblt fb_sys_fops i2c_piix4 joydev drm fuse xfs libcrc32c sr_mod cdrom ata_generic crct10dif_pclmul sd_mod t10_pi ahci crc32_pclmul sg crc32c_intel libahci vmxnet3 ghash_clmulni_intel ata_piix libata vmw_pvscsi serio_raw dm_mirror dm_region_hash dm_log dm_mod
[  +0.000178] CPU: 1 PID: 1701 Comm: rmmod Kdump: loaded Tainted: P           OE     -------  ---  5.14.0-427.18.1.el9_4.x86_64 #1
[  +0.000047] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.21100432.B64.2301110304 01/11/2023
[  +0.000027] RIP: 0010:kobject_put+0x3e/0x60
[  +0.000013] Code: ff ff ff ff f0 0f c1 45 38 83 f8 01 74 20 85 c0 7e 25 5d c3 cc cc cc cc 48 8b 37 48 89 fa 48 c7 c7 00 6d 18 a1 e8 22 e3 aa ff <0f> 0b eb cd 48 89 ef 5d e9 35 01 00 00 be 03 00 00 00 5d e9 8a 5e
[  +0.000041] RSP: 0018:ffffae9580a27e58 EFLAGS: 00010286
[  +0.000015] RAX: 0000000000000000 RBX: 000000000000002f RCX: 0000000000000027
[  +0.000018] RDX: 0000000000000027 RSI: ffffffffa18679c0 RDI: ffff9e6637ca0848
[  +0.000017] RBP: ffff9e6506c36840 R08: 80000000ffff85c0 R09: 0000000000ffff10
[  +0.000020] R10: 0000000000000007 R11: 000000000000000f R12: ffffae9580a27f58
[  +0.000019] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  +0.000017] FS:  00007f55ffbf3740(0000) GS:ffff9e6637c80000(0000) knlGS:0000000000000000
[  +0.000032] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000015] CR2: 000055f2d07c41c8 CR3: 00000001071b4003 CR4: 0000000000770ee0
[  +0.000041] PKRU: 55555554
[  +0.000009] Call Trace:
[  +0.000010]  <TASK>
[  +0.000008]  ? show_trace_log_lvl+0x1c4/0x2df
[  +0.000016]  ? show_trace_log_lvl+0x1c4/0x2df
[  +0.000014]  ? zfs_sysfs_fini+0x122/0x1a0 [zfs]
[  +0.000238]  ? kobject_put+0x3e/0x60
[  +0.000012]  ? __warn+0x81/0x110
[  +0.000012]  ? kobject_put+0x3e/0x60
[  +0.000010]  ? report_bug+0x10a/0x140
[  +0.000012]  ? handle_bug+0x3c/0x70
[  +0.000014]  ? exc_invalid_op+0x14/0x70
[  +0.000012]  ? asm_exc_invalid_op+0x16/0x20
[  +0.000014]  ? kobject_put+0x3e/0x60
[  +0.000010]  ? kobject_put+0x3e/0x60
[  +0.000011]  zfs_sysfs_fini+0x122/0x1a0 [zfs]
[  +0.000149]  openzfs_fini+0x5/0x253 [zfs]
[  +0.000162]  __do_sys_delete_module.constprop.0+0x175/0x280
[  +0.000018]  ? syscall_trace_enter.constprop.0+0x126/0x1a0
[  +0.000795]  do_syscall_64+0x59/0x90
[  +0.000761]  ? exit_to_user_mode_prepare+0xb6/0x100
[  +0.000697]  ? syscall_exit_to_user_mode+0x22/0x40
[  +0.000730]  ? do_syscall_64+0x69/0x90
[  +0.000713]  ? syscall_exit_to_user_mode+0x22/0x40
[  +0.000660]  ? do_syscall_64+0x69/0x90
[  +0.000700]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[  +0.000694] RIP: 0033:0x7f55ff30ee2b
[  +0.000662] Code: 73 01 c3 48 8b 0d f5 af 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c5 af 0e 00 f7 d8 64 89 01 48
[  +0.001409] RSP: 002b:00007fff25e7b258 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[  +0.000703] RAX: ffffffffffffffda RBX: 000055f2d07b97c0 RCX: 00007f55ff30ee2b
[  +0.000646] RDX: 000000000000000a RSI: 0000000000000800 RDI: 000055f2d07b9828
[  +0.000666] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  +0.000628] R10: 00007f55ff39eac0 R11: 0000000000000206 R12: 00007fff25e7b4b0
[  +0.000623] R13: 00007fff25e7c60f R14: 000055f2d07b92a0 R15: 000055f2d07b97c0
[  +0.000608]  </TASK>
[  +0.000540] ---[ end trace ae729120c7e9a3b9 ]---
[  +0.000557] ------------[ cut here ]------------
[  +0.000000] refcount_t: underflow; use-after-free.
[  +0.000010] WARNING: CPU: 1 PID: 1701 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x110
[  +0.001544] Modules linked in: zfs(POE-) spl(OE) tls nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink vfat fat vmwgfx intel_rapl_msr intel_rapl_common intel_uncore_frequency_common isst_if_mbox_msr isst_if_common nfit libnvdimm drm_ttm_helper rapl ttm vmw_balloon drm_kms_helper pcspkr vmw_vmci syscopyarea sysfillrect sysimgblt fb_sys_fops i2c_piix4 joydev drm fuse xfs libcrc32c sr_mod cdrom ata_generic crct10dif_pclmul sd_mod t10_pi ahci crc32_pclmul sg crc32c_intel libahci vmxnet3 ghash_clmulni_intel ata_piix libata vmw_pvscsi serio_raw dm_mirror dm_region_hash dm_log dm_mod
[  +0.002926] CPU: 1 PID: 1701 Comm: rmmod Kdump: loaded Tainted: P        W  OE     -------  ---  5.14.0-427.18.1.el9_4.x86_64 #1
[  +0.000644] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.21100432.B64.2301110304 01/11/2023
[  +0.000673] RIP: 0010:refcount_warn_saturate+0xba/0x110
[  +0.000654] Code: 01 01 e8 e9 83 af ff 0f 0b c3 cc cc cc cc 80 3d 83 f2 b1 01 00 75 85 48 c7 c7 98 21 18 a1 c6 05 73 f2 b1 01 01 e8 c6 83 af ff <0f> 0b c3 cc cc cc cc 80 3d 5e f2 b1 01 00 0f 85 5e ff ff ff 48 c7
[  +0.001351] RSP: 0018:ffffae9580a27e60 EFLAGS: 00010282
[  +0.000685] RAX: 0000000000000000 RBX: 000000000000002f RCX: 0000000000000027
[  +0.000707] RDX: 0000000000000027 RSI: ffffffffa18679c0 RDI: ffff9e6637ca0848
[  +0.000720] RBP: ffffffffc125a380 R08: 80000000ffff85f5 R09: ffffae9580a27de8
[  +0.000686] R10: 0000000000000001 R11: 0000000000000028 R12: ffffae9580a27f58
[  +0.000728] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  +0.000761] FS:  00007f55ffbf3740(0000) GS:ffff9e6637c80000(0000) knlGS:0000000000000000
[  +0.000738] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000738] CR2: 000055f2d07c41c8 CR3: 00000001071b4003 CR4: 0000000000770ee0
[  +0.000771] PKRU: 55555554
[  +0.000766] Call Trace:
[  +0.000735]  <TASK>
[  +0.000705]  ? show_trace_log_lvl+0x1c4/0x2df
[  +0.000762]  ? show_trace_log_lvl+0x1c4/0x2df
[  +0.000740]  ? zfs_sysfs_fini+0x122/0x1a0 [zfs]
[  +0.000939]  ? refcount_warn_saturate+0xba/0x110
[  +0.000750]  ? __warn+0x81/0x110
[  +0.000713]  ? refcount_warn_saturate+0xba/0x110
[  +0.000753]  ? report_bug+0x10a/0x140
[  +0.000739]  ? handle_bug+0x3c/0x70
[  +0.000702]  ? exc_invalid_op+0x14/0x70
[  +0.000739]  ? asm_exc_invalid_op+0x16/0x20
[  +0.000747]  ? refcount_warn_saturate+0xba/0x110
[  +0.000725]  zfs_sysfs_fini+0x122/0x1a0 [zfs]
[  +0.000910]  openzfs_fini+0x5/0x253 [zfs]
[  +0.000905]  __do_sys_delete_module.constprop.0+0x175/0x280
[  +0.000746]  ? syscall_trace_enter.constprop.0+0x126/0x1a0
[  +0.000737]  do_syscall_64+0x59/0x90
[  +0.000687]  ? exit_to_user_mode_prepare+0xb6/0x100
[  +0.000686]  ? syscall_exit_to_user_mode+0x22/0x40
[  +0.000684]  ? do_syscall_64+0x69/0x90
[  +0.000696]  ? syscall_exit_to_user_mode+0x22/0x40
[  +0.000650]  ? do_syscall_64+0x69/0x90
[  +0.000693]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[  +0.000706] RIP: 0033:0x7f55ff30ee2b
[  +0.000660] Code: 73 01 c3 48 8b 0d f5 af 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c5 af 0e 00 f7 d8 64 89 01 48
[  +0.001422] RSP: 002b:00007fff25e7b258 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[  +0.000744] RAX: ffffffffffffffda RBX: 000055f2d07b97c0 RCX: 00007f55ff30ee2b
[  +0.000693] RDX: 000000000000000a RSI: 0000000000000800 RDI: 000055f2d07b9828
[  +0.000748] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  +0.000703] R10: 00007f55ff39eac0 R11: 0000000000000206 R12: 00007fff25e7b4b0
[  +0.000628] R13: 00007fff25e7c60f R14: 000055f2d07b92a0 R15: 000055f2d07b97c0
[  +0.000643]  </TASK>
[  +0.000619] ---[ end trace ae729120c7e9a3ba ]---
[  +0.012487] ZFS: Unloaded module v2.2.4-1
manfromafar commented 4 months ago

When building from source I followed the build guide https://openzfs.github.io/openzfs-docs/Developer%20Resources/Building%20ZFS.html I used to rpm's instead of doing a make install.

manfromafar commented 3 months ago

After some trial and error, I was able to get a version of zfs that didn't have issues unloading the module on opensuse 15.5. Using kernel kernel-default-5.14.21-150500.55.44.1.x86_64.rpm I was able to get 2.1.13 installed and there were no issues with loading/unloading/deleting larges amounts of datasets. Untested is doing my actual backups but now that I have a "working" system that will be the next step.

2.2.4 still has issues on unload though using the same kernel as the 2.1.13 test

arturpzol commented 2 months ago

The issue is also reproducible in my environments when unloading the ZFS module on ZFS 2.2.4, regardless of the kernel version, so it seems like a bug in ZFS 2.2.4. The issue does not occur on ZFS 2.2.3.

ixhamza commented 2 months ago

https://github.com/openzfs/zfs/commit/db65272aef3d380d2bd1c94907826f2b9ec9205e seems to cause this issue in the 2.2.4 release. VDEV_PROP_RAIDZ_EXPANDING is defined but was never registered with sysfs, which causes an exception during kobj release, though it is not a problem with the master branch due to complete raidz expand support there. @tonyhutter, we may need to remove the property from the list in the zfs-2.2.5-staging branch.

tonyhutter commented 2 months ago

@ixhamza thanks, I'll take a look.

tonyhutter commented 2 months ago

This fixes it:

diff --git a/module/os/linux/zfs/zfs_sysfs.c b/module/os/linux/zfs/zfs_sysfs.c
index e2431fe8a..492ab8184 100644
--- a/module/os/linux/zfs/zfs_sysfs.c
+++ b/module/os/linux/zfs/zfs_sysfs.c
@@ -110,8 +110,10 @@ zfs_kobj_fini(zfs_mod_kobj_t *zkobj)
        }

        /* kobject_put() will call zfs_kobj_release() to release memory */
-       kobject_del(&zkobj->zko_kobj);
-       kobject_put(&zkobj->zko_kobj);
+       if (zkobj->zko_kobj.name != NULL) {
+               kobject_del(&zkobj->zko_kobj);
+               kobject_put(&zkobj->zko_kobj);
+       }
 }

 static void

Let me put together a PR against zfs-2.2.5-staging.

tonyhutter commented 2 months ago

https://github.com/openzfs/zfs/pull/16406

tonyhutter commented 2 months ago

@ixhamza would you mind taking a look at the fix: https://github.com/openzfs/zfs/pull/16406

ixhamza commented 2 months ago

@tonyhutter - Looks good to me. I understand that we don’t have the same issue in the master branch, but I think it could still be beneficial to add it there as well.

tonyhutter commented 1 month ago

This is fixed in zfs-2.2.5. Closing issue.

justinpryzby commented 1 month ago

Ugh. I'm used to upgrading zfs with: service stop / zpool export / modprobe -r / modprobe / import / start. When I tried to upgrade one instance to 2.2.5, it caused a kernel oops, and required a local admin to forcibly power cycle the VM, which was stuck with the module in the "unloading" state, and services offline. It seems like now the only safe way to upgrade off of 2.2.4 is to reboot ?

i suggest this could use greater visibility in the release notes, which currently say: "[2.2.5-only] Make 'rmmod zfs' work after zfs-2.2.4 (#16406)".

I suggest: NOTE: attempting to remove module version 2.2.4 cannot be gracefully removed. To upgrade from 2.2.4, it's recommended to reboot rather than modprobe -r/rmmod.