Closed peterska closed 1 year ago
@gamanakis : Are you sure this patch is complete? It ends with an open "="
It's just a diff patch, it should apply correctly. It adds the line beginning with a plus (+).
I seem to have accidentally reproduced this while trying to reproduce #14330, so that's neat.
sudo zpool create banoom /dev/nvme1n1p1 -f
sudo zfs create banoom/encryptme -o encryption=on -o keyformat=raw -o keylocation=file:///home/rich/badkey
sudo cp -a . /banoom/encryptme/zfs_me2/
sudo zfs snapshot banoom/encryptme@snap1
sudo cp -a . /banoom/encryptme/zfs_me3/
sudo zfs snapshot banoom/encryptme@snap2
sudo cp -a . /banoom/encryptme/zfs_me4/
sudo zfs snapshot banoom/encryptme@snap3
sudo ./cmd/zfs/zfs send -Lc banoom/encryptme@snap1 | sudo ./cmd/zfs/zfs recv -o encryption=on -o keyformat=raw -o keylocation=file:///home/rich/badkey2 banoom/encrypttwo
sudo ./cmd/zfs/zfs send -Lc banoom/encryptme@snap1 | sudo ./cmd/zfs/zfs recv banoom/unencryptme
sudo ./cmd/zfs/zfs send -Lc -i banoom/encryptme@{snap1,snap2} | sudo ./cmd/zfs/zfs recv banoom/unencryptme
sudo zfs rollback banoom/encrypttwo@snap1
sudo ./cmd/zfs/zfs send -Lc -i banoom/unencryptme@{snap1,snap2} | sudo ./cmd/zfs/zfs recv banoom/encrypttwo
sudo ./cmd/zfs/zfs send -Lc -i banoom/encryptme@{snap2,snap3} | sudo ./cmd/zfs/zfs recv banoom/unencryptme
sudo ./cmd/zfs/zfs send -Lc -i banoom/unencryptme@{snap2,snap3} | sudo ./cmd/zfs/zfs recv -F banoom/encrypttwo
(It's probably possible to simplify, but this worked for me. This worked for someone else.)
[3657208.606272] VERIFY3(0 == zap_remove(mos, dsobj, spa_feature_table[f].fi_guid, tx)) failed (0 == 2)
[3657208.607657] PANIC at dsl_dataset.c:1116:dsl_dataset_deactivate_feature_impl()
[3657208.609042] Showing stack for process 499848
[3657208.610394] CPU: 1 PID: 499848 Comm: txg_sync Kdump: loaded Tainted: P OE 5.10.0-15-amd64 #1 Debian 5.10.120-1
[3657208.613072] Hardware name: Micro-Star International Co., Ltd. MS-7D50/MEG X570S ACE MAX (MS-7D50), BIOS 1.40 05/24/2022
[3657208.614406] Call Trace:
[3657208.615730] dump_stack+0x6b/0x83
[3657208.617033] spl_panic+0xd4/0xfc [spl]
[3657208.618330] ? dbuf_rele_and_unlock+0x132/0x660 [zfs]
[3657208.619598] ? spl_kmem_alloc_impl+0xae/0xf0 [spl]
[3657208.620834] ? avl_find+0x53/0x90 [zavl]
[3657208.622078] ? zap_remove_impl+0xb3/0x120 [zfs]
[3657208.623259] ? kfree+0xba/0x480
[3657208.624440] dsl_dataset_deactivate_feature_impl+0xf6/0x100 [zfs]
[3657208.625610] dsl_dataset_clone_swap_sync_impl+0x83e/0x880 [zfs]
[3657208.626755] ? dsl_dataset_hold_flags+0x99/0x240 [zfs]
[3657208.627877] dmu_recv_end_sync+0x17e/0x580 [zfs]
[3657208.628985] dsl_sync_task_sync+0xa6/0xf0 [zfs]
[3657208.630057] dsl_pool_sync+0x40d/0x520 [zfs]
[3657208.631108] spa_sync+0x542/0xfa0 [zfs]
[3657208.632108] ? mutex_lock+0xe/0x30
[3657208.633123] ? spa_txg_history_init_io+0x101/0x110 [zfs]
[3657208.634137] txg_sync_thread+0x287/0x410 [zfs]
[3657208.635149] ? txg_fini+0x250/0x250 [zfs]
[3657208.636097] thread_generic_wrapper+0x6f/0x80 [spl]
[3657208.637017] ? __thread_exit+0x20/0x20 [spl]
[3657208.637924] kthread+0x11b/0x140
[3657208.638802] ? __kthread_bind_mask+0x60/0x60
[3657208.639661] ret_from_fork+0x1f/0x30
edit 2, now with debug information:
[ 87.147240] NOTICE: feature activated: 16
[ 87.147249] NOTICE: feature activated: 18
[ 88.035999] VERIFY0(0 == zap_remove(mos, dsobj, spa_feature_table[f].fi_guid, tx)) failed (0 == 2)
[ 88.036005] PANIC at dsl_dataset.c:1129:dsl_dataset_deactivate_feature_impl()
[ 88.036007] Showing stack for process 5181
[ 88.036010] CPU: 17 PID: 5181 Comm: txg_sync Kdump: loaded Tainted: P OE 5.10.0-15-amd64 #1 Debian 5.10.120-1
[ 88.036012] Hardware name: Micro-Star International Co., Ltd. MS-7D50/MEG X570S ACE MAX (MS-7D50), BIOS 1.40 05/24/2022
[ 88.036014] Call Trace:
[ 88.036021] dump_stack+0x6b/0x83
[ 88.036029] spl_panic+0xd4/0xfc [spl]
[ 88.036033] ? __kmalloc_node+0x141/0x2b0
[ 88.036083] ? dbuf_rele_and_unlock+0x132/0x660 [zfs]
[ 88.036133] ? zap_remove_impl+0xef/0x180 [zfs]
[ 88.036135] ? kfree+0xba/0x480
[ 88.036178] ? zap_remove_impl+0xef/0x180 [zfs]
[ 88.036225] dsl_dataset_deactivate_feature_impl+0xf6/0x100 [zfs]
[ 88.036268] dsl_dataset_clone_swap_sync_impl+0x83e/0x880 [zfs]
[ 88.036309] ? dsl_dataset_hold_flags+0x99/0x240 [zfs]
[ 88.036352] dmu_recv_end_sync+0x24e/0x5b0 [zfs]
[ 88.036398] dsl_sync_task_sync+0xa6/0xf0 [zfs]
[ 88.036441] dsl_pool_sync+0x40d/0x520 [zfs]
[ 88.036488] spa_sync+0x540/0xf80 [zfs]
[ 88.036492] ? mutex_lock+0xe/0x30
[ 88.036537] ? spa_txg_history_init_io+0x101/0x110 [zfs]
[ 88.036579] txg_sync_thread+0x22c/0x3b0 [zfs]
[ 88.036618] ? txg_quiesce_thread+0x330/0x330 [zfs]
[ 88.036623] thread_generic_wrapper+0x6f/0x80 [spl]
[ 88.036627] ? spl_taskq_fini+0x70/0x70 [spl]
[ 88.036630] kthread+0x11b/0x140
[ 88.036632] ? __kthread_bind_mask+0x60/0x60
[ 88.036634] ret_from_fork+0x1f/0x30
And 18 appears to be...
SPA_FEATURE_PROJECT_QUOTA
So once again, the quota feature and encryption burn the nest down.
@gamanakis : I was able to reproduce it with the NOTICE messages. The result is simlar to what @rincebrain found:
Dez 27 08:17:23 rakete kernel: NOTICE: feature activated: 16
Dez 27 08:17:23 rakete kernel: NOTICE: feature activated: 18
Dez 27 08:17:27 rakete kernel: VERIFY3(0 == zap_remove(mos, dsobj, spa_feature_table[f].fi_guid, tx)) failed (0 == 2)
Dez 27 08:17:27 rakete kernel: PANIC at dsl_dataset.c:1116:dsl_dataset_deactivate_feature_impl()
Dez 27 08:17:27 rakete kernel: Showing stack for process 3194
Dez 27 08:17:27 rakete kernel: CPU: 7 PID: 3194 Comm: txg_sync Tainted: P OE 5.15.85-1-lts #1 8627258d3b982627cb02935c7bfd65137eb7e755
Dez 27 08:17:27 rakete kernel: Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS ULTRA/X570 AORUS ULTRA, BIOS F36d 07/20/2022
Dez 27 08:17:27 rakete kernel: Call Trace:
Dez 27 08:17:27 rakete kernel: <TASK>
Dez 27 08:17:27 rakete kernel: dump_stack_lvl+0x45/0x5b
Dez 27 08:17:27 rakete kernel: spl_panic+0xf0/0x108 [spl 77227ad6aa0fef1593e90f9cdd25fef94c2bbb59]
Dez 27 08:17:27 rakete kernel: dsl_dataset_deactivate_feature_impl+0xfb/0x100 [zfs 803b2177056c0cc018d9e6717e0d7333d8f74f3d]
Dez 27 08:17:27 rakete kernel: dsl_dataset_clone_swap_sync_impl+0x8b9/0xaa0 [zfs 803b2177056c0cc018d9e6717e0d7333d8f74f3d]
Dez 27 08:17:27 rakete kernel: dsl_dataset_rollback_sync+0x10f/0x1e0 [zfs 803b2177056c0cc018d9e6717e0d7333d8f74f3d]
Dez 27 08:17:27 rakete kernel: dsl_sync_task_sync+0xa8/0xf0 [zfs 803b2177056c0cc018d9e6717e0d7333d8f74f3d]
Dez 27 08:17:27 rakete kernel: dsl_pool_sync+0x3f7/0x510 [zfs 803b2177056c0cc018d9e6717e0d7333d8f74f3d]
Dez 27 08:17:27 rakete kernel: spa_sync+0x567/0xf90 [zfs 803b2177056c0cc018d9e6717e0d7333d8f74f3d]
Dez 27 08:17:27 rakete kernel: ? spa_txg_history_init_io+0x113/0x120 [zfs 803b2177056c0cc018d9e6717e0d7333d8f74f3d]
Dez 27 08:17:27 rakete kernel: txg_sync_thread+0x226/0x3f0 [zfs 803b2177056c0cc018d9e6717e0d7333d8f74f3d]
Dez 27 08:17:27 rakete kernel: ? txg_fini+0x260/0x260 [zfs 803b2177056c0cc018d9e6717e0d7333d8f74f3d]
Dez 27 08:17:27 rakete kernel: ? __thread_exit+0x20/0x20 [spl 77227ad6aa0fef1593e90f9cdd25fef94c2bbb59]
Dez 27 08:17:27 rakete kernel: thread_generic_wrapper+0x5a/0x70 [spl 77227ad6aa0fef1593e90f9cdd25fef94c2bbb59]
Dez 27 08:17:27 rakete kernel: kthread+0x118/0x140
Dez 27 08:17:27 rakete kernel: ? set_kthread_struct+0x50/0x50
Dez 27 08:17:27 rakete kernel: ret_from_fork+0x22/0x30
Dez 27 08:17:27 rakete kernel: </TASK>
Just to summarize my findings so far:
The last couple of days I was running 2.1.7 with the PR from @gamanakis : #14304 At first it looked really promising. No crashes at first. But then I had another freeze 2 days ago.
Then I assumed that zstd compression could eventually cause the issue. I have lz4 on the sender side and zstd on the receiver side. I moved all the receiver side to lz4. But then I got another freeze. So zstd is not the issue.
May be it is encryption related. Sender and receiver are encrypted with a keyfile. But I do not know.
Now I am back to square one and I have stock 2.1.7 running with the large-block commit reverted: c8d2ab0
So far I have no freezes.
I don’t think it’s encryption related. I got a crash when receiving into 2.1.7 from 0.8.x. Neither side has any encryption enabled.
Sender has lz4 whereas receiver is zstd-19. I’ll try reverting to lz4 receiver when I get a chance. Of note, the first full send (using syncoid) worked without issue. It was the next, incremental send which crashed my system.
I could certainly believe it's not encryption related, I've just seen the quota feature and encryption interact poorly before.
My guess would be that it's a timing issue where something doesn't actually block on something else happening first, and if you use encryption or high compression with large blocks or the like, one task takes long enough that the other hits this.
Quickly, I added two log lines in the actual deactivate_impl, and got:
[ 466.783588] NOTICE: Feature 18 deactivation attempting
[ 466.784023] NOTICE: Feature 18 deactivation completed
[ 466.790930] NOTICE: Feature 16 deactivation attempting
[ 466.791353] NOTICE: Feature 16 deactivation completed
[ 466.791726] NOTICE: Feature 18 deactivation attempting
[ 466.792083] NOTICE: Feature 18 deactivation completed
[ 468.010089] NOTICE: Feature 16 deactivation attempting
[ 468.010451] NOTICE: Feature 16 deactivation completed
[ 468.010761] NOTICE: Feature 16 deactivation attempting
[ 468.011065] VERIFY0(0 == zap_remove(mos, dsobj, spa_feature_table[f].fi_guid, tx)) failed (0 == 2)
So it seems it tries to deactivate, if I can count, SPA_FEATURE_USEROBJ_ACCOUNTING
twice, and boom goes the dynamite...
I think it has to do with the code here:
diff --git a/module/zfs/dmu_objset.c b/module/zfs/dmu_objset.c
index c17c829a0..0e8427f19 100644
--- a/module/zfs/dmu_objset.c
+++ b/module/zfs/dmu_objset.c
@@ -2408,13 +2408,6 @@ dmu_objset_id_quota_upgrade_cb(objset_t *os)
dmu_objset_userobjspace_present(os))
return (SET_ERROR(ENOTSUP));
- if (dmu_objset_userobjused_enabled(os))
- dmu_objset_ds(os)->ds_feature_activation[
- SPA_FEATURE_USEROBJ_ACCOUNTING] = (void *)B_TRUE;
- if (dmu_objset_projectquota_enabled(os))
- dmu_objset_ds(os)->ds_feature_activation[
- SPA_FEATURE_PROJECT_QUOTA] = (void *)B_TRUE;
-
err = dmu_objset_space_upgrade(os);
if (err)
return (err);
If I remove that then it doesn't panic.
The above change to dmu_objset.c works. It fixes my syncoid backup, which send from zfs 2.0 pool to an encrypted zfs 2.1 pool. feature@userobj_accounting and feature@project_quota are activated in both pools.
@gamanakis : I am testing the new PR https://github.com/openzfs/zfs/pull/14304 since a couple of days. (with patch for dmu_objset.c) I want to confirm here that it solves my send/receive issues.
Is there a workaround that doesn’t require a code change? Maybe a flag I can pass into the send or receive command to avoid the code path that causes panic?
I’m in a situation where one server is offsite and unmanaged and if I happen to break it by injecting my own zfs modules then recovery will be tricky.
I just got this line (the only one which I could read of the screen) and needed to reboot:
VERIFY0(0 == zap_remove(mos, dsobj, spa_feature_table[f].fi_guid, tx)) failed (0 == 2)
with zfs 2.1.9:
$ zfs version
zfs-2.1.9-1
zfs-kmod-2.1.9-1
$ uname -r
5.15.0-58-generic
@phreaker0 can you provide additional information? were you sending/receiving when this happened?
@gamanakis i wasn't around when it happened, what I did before I left:
but I guess the crash happened pretty soon, because my influxdb logging stuff didn't record anything any more. The ssd pool is used as ROOT. Therefore I couldn't login any more after the panic to gather more information, I only had the screen output.
I'm trying to reproduce it now, but so far it works, I will keep an active ssh session with the kernel log in case of a crash and will report back.
FYI: I can't reproduce it anymore, it ran for several days.
I've just had this happen on RHEL 9 with 2.1.9, pool comprised from LUKS encrypted devices (which I'm using because of the send/receive issues with native encryption #11679 - aargh!). A receive from a Nexenta 4 host to this system (unencrypted) (1M record size/lz4 on the destination, 128K record size on the source) using syncoid+mbuffer on destination reported:
VERIFY0(0 == zap_remove(mos, dsobj, spa_feature_table[f].fi_guid, tx)) failed (0 == 2)
With /var/log/messages reporting:
Feb 17 08:51:32 fs6 kernel: VERIFY3(0 == zap_remove(mos, dsobj, spa_feature_table[f].fi_guid, tx)) failed (0 == 2) Feb 17 08:51:32 fs6 kernel: PANIC at dsl_dataset.c:1116:dsl_dataset_deactivate_feature_impl() Feb 17 08:51:32 fs6 kernel: Showing stack for process 1123981 Feb 17 08:51:32 fs6 kernel: CPU: 4 PID: 1123981 Comm: txg_sync Kdump: loaded Tainted: P OE --------- --- 5.14.0-162.12.1.el9_1.x86_64 #1 Feb 17 08:51:32 fs6 kernel: Hardware name: Dell Inc. PowerEdge R7525/0590KW, BIOS 2.9.3 08/05/2022 Feb 17 08:51:32 fs6 kernel: Call Trace: Feb 17 08:51:32 fs6 kernel: dump_stack_lvl+0x34/0x48 Feb 17 08:51:32 fs6 kernel: spl_panic+0xd1/0xe9 [spl] Feb 17 08:51:32 fs6 kernel: ? dbuf_rele_and_unlock+0x387/0x6b0 [zfs] Feb 17 08:51:32 fs6 kernel: ? zap_remove_impl+0xb3/0x120 [zfs] Feb 17 08:51:32 fs6 kernel: ? kfree+0xac/0x3f0 Feb 17 08:51:32 fs6 kernel: ? spl_kmem_alloc+0xb2/0x100 [spl] Feb 17 08:51:32 fs6 kernel: dsl_dataset_deactivate_feature_impl+0xfa/0x100 [zfs] Feb 17 08:51:32 fs6 kernel: dsl_dataset_clone_swap_sync_impl+0x888/0xb50 [zfs] Feb 17 08:51:32 fs6 kernel: dsl_dataset_rollback_sync+0xf0/0x1d0 [zfs] Feb 17 08:51:32 fs6 kernel: ? dsl_dataset_hold_flags+0x9d/0x230 [zfs] Feb 17 08:51:32 fs6 kernel: ? dsl_dataset_rollback_check+0x2f9/0x430 [zfs] Feb 17 08:51:32 fs6 kernel: dsl_sync_task_sync+0xaa/0xf0 [zfs] Feb 17 08:51:32 fs6 kernel: dsl_pool_sync+0x40c/0x520 [zfs] Feb 17 08:51:32 fs6 kernel: spa_sync_iterate_to_convergence+0xf0/0x2f0 [zfs] Feb 17 08:51:32 fs6 kernel: spa_sync+0x471/0x930 [zfs] Feb 17 08:51:32 fs6 kernel: txg_sync_thread+0x27a/0x400 [zfs] Feb 17 08:51:32 fs6 kernel: ? txg_fini+0x260/0x260 [zfs] Feb 17 08:51:32 fs6 kernel: thread_generic_wrapper+0x59/0x70 [spl] Feb 17 08:51:32 fs6 kernel: ? __thread_exit+0x20/0x20 [spl] Feb 17 08:51:32 fs6 kernel: kthread+0x149/0x170 Feb 17 08:51:32 fs6 kernel: ? set_kthread_struct+0x50/0x50 Feb 17 08:51:32 fs6 kernel: ret_from_fork+0x22/0x30
As this is part of a HA pair, the host was fenced shortly afterwards and service resumed on the peer host, where the pool imported with no errors. Pushing the service back to the original host and restarting the transfer has so far not resulted in a repetition.
Load on the system is minimal during the transfer, ca 1.3.
Same here on zfs 2.1.9 when doing a rollback on an unmounted dataset:
Linux duranux2 6.0.19 #1 SMP PREEMPT_DYNAMIC Thu Feb 9 01:12:28 CET 2023 x86_64 GNU/Linux
févr. 18 20:07:32 duranux2 kernel: VERIFY3(0 == zap_remove(mos, dsobj, spa_feature_table[f].fi_guid, tx)) failed (0 == 2)
févr. 18 20:07:32 duranux2 kernel: PANIC at dsl_dataset.c:1116:dsl_dataset_deactivate_feature_impl()
févr. 18 20:07:32 duranux2 kernel: Showing stack for process 436
févr. 18 20:07:32 duranux2 kernel: CPU: 2 PID: 436 Comm: txg_sync Tainted: P O 6.0.19 #1
févr. 18 20:07:32 duranux2 kernel: Hardware name: System manufacturer System Product Name/P8Z68-V PRO, BIOS 3603 11/09/2012
févr. 18 20:07:32 duranux2 kernel: Call Trace:
févr. 18 20:07:32 duranux2 kernel:
@scratchings @duramuss There is a more suitable fix, see https://github.com/openzfs/zfs/commit/34ce4c42ffdcd8933768533343d8b29f9612fbae and https://github.com/openzfs/zfs/pull/14502. Hopefully the upcoming 2.1.10 will include them.
Same here. After 30 days of uptime. No ZFS commands issued during last 2 weeks, just normal ops. ZFS has 2 normal sets, 1 zvol and 1 encrypted set. LZ4 compression, all other settings are set to defaults.
mar 04 21:15:12 sl kernel: VERIFY0(0 == zap_remove(mos, dsobj, spa_feature_table[f].fi_guid, tx)) failed (0 == 2)
mar 04 21:15:12 sl kernel: PANIC at dsl_dataset.c:1129:dsl_dataset_deactivate_feature_impl()
mar 04 21:15:12 sl kernel: Showing stack for process 40697
mar 04 21:15:12 sl kernel: CPU: 2 PID: 40697 Comm: txg_sync Tainted: P OE 5.13.0-39-generic #44~20.04.1-Ubuntu
mar 04 21:15:12 sl kernel: Hardware name: System manufacturer System Product Name/PRIME Z270-P, BIOS 0610 05/11/2017
mar 04 21:15:12 sl kernel: Call Trace:
mar 04 21:15:12 sl kernel: <TASK>
mar 04 21:15:12 sl kernel: dump_stack+0x7d/0x9c
mar 04 21:15:12 sl kernel: spl_dumpstack+0x29/0x2b [spl]
mar 04 21:15:12 sl kernel: spl_panic+0xd4/0xfc [spl]
mar 04 21:15:12 sl kernel: ? spl_kmem_free+0x28/0x30 [spl]
mar 04 21:15:12 sl kernel: ? kfree+0xd8/0x2a0
mar 04 21:15:12 sl kernel: ? dbuf_rele+0x3d/0x50 [zfs]
mar 04 21:15:12 sl kernel: ? dmu_buf_rele+0xe/0x10 [zfs]
mar 04 21:15:12 sl kernel: ? zap_unlockdir+0x3f/0x60 [zfs]
mar 04 21:15:12 sl kernel: ? zap_remove_norm+0x76/0xa0 [zfs]
mar 04 21:15:12 sl kernel: dsl_dataset_deactivate_feature_impl+0x101/0x110 [zfs]
mar 04 21:15:12 sl kernel: dsl_dataset_clone_swap_sync_impl+0x874/0x9d0 [zfs]
mar 04 21:15:12 sl kernel: ? dbuf_rele+0x3d/0x50 [zfs]
mar 04 21:15:12 sl kernel: ? dmu_buf_rele+0xe/0x10 [zfs]
mar 04 21:15:12 sl kernel: ? dsl_dir_rele+0x30/0x40 [zfs]
mar 04 21:15:12 sl kernel: dmu_recv_end_sync+0x26f/0x5a0 [zfs]
mar 04 21:15:12 sl kernel: dsl_sync_task_sync+0xb6/0x100 [zfs]
mar 04 21:15:12 sl kernel: dsl_pool_sync+0x3d6/0x4f0 [zfs]
mar 04 21:15:12 sl kernel: spa_sync+0x55e/0xfd0 [zfs]
mar 04 21:15:12 sl kernel: ? spa_txg_history_init_io+0x106/0x110 [zfs]
mar 04 21:15:12 sl kernel: txg_sync_thread+0x229/0x3b0 [zfs]
mar 04 21:15:12 sl kernel: ? txg_quiesce_thread+0x340/0x340 [zfs]
mar 04 21:15:12 sl kernel: thread_generic_wrapper+0x79/0x90 [spl]
mar 04 21:15:12 sl kernel: ? spl_taskq_fini+0x80/0x80 [spl]
mar 04 21:15:12 sl kernel: kthread+0x12b/0x150
mar 04 21:15:12 sl kernel: ? set_kthread_struct+0x40/0x40
mar 04 21:15:12 sl kernel: ret_from_fork+0x22/0x30
mar 04 21:15:12 sl kernel: </TASK>
$ zfs --version
zfs-2.1.99-1641_gc935fe2e9
zfs-kmod-2.1.99-1641_gc935fe2e9
Will rebuild from master
.
@slavanap this should be fixed in current master
and in the upcoming 2.1.10 release.
I patched 2.1.9 using https://github.com/openzfs/zfs/commit/34ce4c42ffdcd8933768533343d8b29f9612fbae and https://github.com/openzfs/zfs/pull/14502 and deployed this yesterday, I'm sorry to say I'm still getting panics - two today:
Message from syslogd@fs7 at Mar 9 16:24:12 ... kernel:VERIFY3(0 == zap_remove(mos, dsobj, spa_feature_table[f].fi_guid, tx)) failed (0 == 2)
Message from syslogd@fs7 at Mar 9 16:24:12 ... kernel:PANIC at dsl_dataset.c:1116:dsl_dataset_deactivate_featur
I'd probably try eee9362a72cfd615e40928e86d61747683dc9dc6 instead of 0f32b1f7289f224691e48d6998ad28d5b3a589c3 if that's still failing for you. If it still fails with 34ce4c4 and eee9362a72cfd615e40928e86d61747683dc9dc6 , that's differently interesting. If not, then we know 0f32b1f7289f224691e48d6998ad28d5b3a589c3 either needs more work or there's something about master that makes it behave differently than 2.1 here, depending.
To everyone interested: give zfs-2.1.10-staging a try. It contains already all the fixes.
System information
Describe the problem you're observing
When using zfs send to make a backup on a remote machine, the receiver throws a PANIC in one of the zfs functions and the file system deadlocks.
Describe how to reproduce the problem
Include any warning/errors/backtraces from the system logs