openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.42k stars 1.72k forks source link

ZFS Receive of encrypted incremental data stream causes a PANIC #13445

Open cyberpower678 opened 2 years ago

cyberpower678 commented 2 years ago

System information

Type Version/Name
Distribution Name Debian
Distribution Version Bullseye
Kernel Version 5.10.109+truenas
Architecture amd64
OpenZFS Version zfs-2.1.2-95_g1d2cdd23b zfs-kmod-2.1.2-95_g1d2cdd23b

Describe the problem you're observing

During an incremental receive, ZFS caused a panic and a system hangup.

Describe how to reproduce the problem

It happens randomly.

Include any warning/errors/backtraces from the system logs

[456678.240841] VERIFY3(0 == dmu_object_set_blocksize(rwa->os, drro->drr_object, drro->drr_blksz, drro->drr_indblkshift, tx)) failed (0 == 95)
[456678.243815] PANIC at dmu_recv.c:1776:receive_object()
[456678.245141] Showing stack for process 2936808
[456678.246532] CPU: 10 PID: 2936808 Comm: receive_writer Tainted: P           OE     5.10.109+truenas #1
[456678.247840] Hardware name: Supermicro X9QR7-TF+/X9QRi-F+/X9QR7-TF+/X9QRi-F+, BIOS 3.0b 05/20/2015
[456678.249138] Call Trace:
[456678.250421]  dump_stack+0x6b/0x83
[456678.251676]  spl_panic+0xd4/0xfc [spl]
[456678.253038]  ? arc_buf_access+0x14c/0x250 [zfs]
[456678.254276]  ? dnode_hold_impl+0x4e9/0xef0 [zfs]
[456678.255493]  ? dnode_set_blksz+0x13b/0x300 [zfs]
[456678.256677]  ? dnode_rele_and_unlock+0x5c/0xc0 [zfs]
[456678.257846]  receive_object+0xc2c/0xca0 [zfs]
[456678.258984]  ? dmu_object_next+0xd6/0x120 [zfs]
[456678.260098]  ? receive_writer_thread+0xbd/0xad0 [zfs]
[456678.261160]  ? kfree+0x40c/0x480
[456678.262202]  ? _cond_resched+0x16/0x40
[456678.263244]  receive_writer_thread+0x1cc/0xad0 [zfs]
[456678.264280]  ? thread_generic_wrapper+0x62/0x80 [spl]
[456678.265252]  ? kfree+0x40c/0x480
[456678.266242]  ? receive_process_write_record+0x190/0x190 [zfs]
[456678.267177]  ? thread_generic_wrapper+0x6f/0x80 [spl]
[456678.268092]  thread_generic_wrapper+0x6f/0x80 [spl]
[456678.268988]  ? __thread_exit+0x20/0x20 [spl]
[456678.269864]  kthread+0x11b/0x140
[456678.270706]  ? __kthread_bind_mask+0x60/0x60
[456678.271538]  ret_from_fork+0x22/0x30
rincebrain commented 2 years ago

I would suggest trying a 2.1.4-based build, as there were fixes for encrypted send/recv and blocksize changing in the interim.

Nemezor commented 2 years ago

I am experiencing the same issue. I copied the bug issue template to organize my response better, but since it is a duplicate I don't want to open a new issue.

System information

Type Version/Name
Distribution Name Debian
Distribution Version Bullseye
Kernel Version 5.10.0-14-amd64
Architecture amd64
OpenZFS Version zfs-2.1.4-1 zfs-kmod-2.1.4-1

Describe the problem you're observing

Same result - receiving system locks up in a ZFS panic.

Describe how to reproduce the problem

Happens reliably with both of my TrueNAS SCALE datasets. I vaguely remember it happening after I changed permissions of some folders, but nothing more concrete.

Source dataset is a dataset created by TrueNAS with no special manual changes/settings, it is fully encrypted. Destination dataset was created using:

openssl rand -out /key 32
zpool create -O encryption=aes-256-gcm -O keyformat=raw -O keylocation=file:///key pool /dev/sdb3
zfs create pool/test

Dataset was then sent to pool/test, resulting in a panic and lockup of the system.

If required, I can somehow try to provide the dataset that causes this (it is about 135GB in size and in active use), currently not sure how since zfs sending it is not an option... Encryption keys will of course not be provided.

Include any warning/errors/backtraces from the system logs

From my production server running ZFS 2.1.2 where this was found (Sorry for the image, not sure why I took a screenshot instead of copying the text): image

Reproduced in a virtual machine running ZFS 2.1.4:

[ 1712.885113] VERIFY3(0 == dmu_object_set_blocksize(rwa->os, drro->drr_object, drro->drr_blksz, drro->drr_indblkshift, tx)) failed (0 == 95)
[ 1712.885147] PANIC at dmu_recv.c:1784:receive_object()
[ 1712.885159] Showing stack for process 359011
[ 1712.885161] CPU: 2 PID: 359011 Comm: receive_writer Tainted: P           OE     5.10.0-14-amd64 #1 Debian 5.10.113-1
[ 1712.885161] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 1712.885162] Call Trace:
[ 1712.885168]  dump_stack+0x6b/0x83
[ 1712.885173]  spl_panic+0xd4/0xfc [spl]
[ 1712.885204]  ? arc_buf_access+0x14c/0x250 [zfs]
[ 1712.885231]  ? dnode_hold_impl+0x4e9/0xef0 [zfs]
[ 1712.885258]  ? dnode_set_blksz+0x13b/0x300 [zfs]
[ 1712.885283]  ? dnode_rele_and_unlock+0x64/0xc0 [zfs]
[ 1712.885316]  receive_object+0xc2c/0xca0 [zfs]
[ 1712.885342]  ? dnode_rele_and_unlock+0x64/0xc0 [zfs]
[ 1712.885368]  ? flush_write_batch+0x1de/0x560 [zfs]
[ 1712.885393]  receive_writer_thread+0x1cc/0xad0 [zfs]
[ 1712.885395]  ? kfree+0xba/0x480
[ 1712.885420]  ? receive_process_write_record+0x190/0x190 [zfs]
[ 1712.885423]  ? thread_generic_wrapper+0x6f/0x80 [spl]
[ 1712.885425]  thread_generic_wrapper+0x6f/0x80 [spl]
[ 1712.885428]  ? __thread_exit+0x20/0x20 [spl]
[ 1712.885430]  kthread+0x11b/0x140
[ 1712.885431]  ? __kthread_bind_mask+0x60/0x60
[ 1712.885432]  ret_from_fork+0x22/0x30

Additionally, attempting to delete the partially received dataset on the receiving side results in another ZFS panic:

[  173.583242] PANIC: zfs: adding existent segment to range tree (offset=3c2937b800 size=400)
[  173.583267] Showing stack for process 1342
[  173.583269] CPU: 3 PID: 1342 Comm: txg_sync Tainted: P           OE     5.10.0-14-amd64 #1 Debian 5.10.113-1
[  173.583269] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  173.583270] Call Trace:
[  173.583275]  dump_stack+0x6b/0x83
[  173.583281]  vcmn_err.cold+0x58/0x80 [spl]
[  173.583284]  ? sysvec_apic_timer_interrupt+0x36/0x80
[  173.583286]  ? irq_exit_rcu+0x3e/0xc0
[  173.583287]  ? sysvec_apic_timer_interrupt+0x36/0x80
[  173.583288]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
[  173.583317]  ? zfs_btree_insert_into_leaf+0x233/0x2a0 [zfs]
[  173.583352]  zfs_panic_recover+0x6d/0x90 [zfs]
[  173.583384]  range_tree_add_impl+0x305/0xe40 [zfs]
[  173.583414]  metaslab_free_concrete+0x11d/0x250 [zfs]
[  173.583441]  metaslab_free_impl+0xa9/0xe0 [zfs]
[  173.583467]  metaslab_free+0x168/0x190 [zfs]
[  173.583501]  zio_free_sync+0xda/0xf0 [zfs]
[  173.583533]  zio_free+0xb3/0xf0 [zfs]
[  173.583563]  delete_blkptr_cb+0x22/0x110 [zfs]
[  173.583589]  bplist_iterate+0xc9/0x130 [zfs]
[  173.583617]  ? sublist_delete_sync+0x50/0x50 [zfs]
[  173.583644]  sublist_delete_sync+0x2e/0x50 [zfs]
[  173.583673]  dsl_sync_task_sync+0xa6/0xf0 [zfs]
[  173.583701]  dsl_pool_sync+0x40d/0x520 [zfs]
[  173.583730]  spa_sync+0x542/0xfa0 [zfs]
[  173.583731]  ? mutex_lock+0xe/0x30
[  173.583762]  ? spa_txg_history_init_io+0x101/0x110 [zfs]
[  173.583792]  txg_sync_thread+0x2e0/0x4a0 [zfs]
[  173.583821]  ? txg_fini+0x250/0x250 [zfs]
[  173.583824]  thread_generic_wrapper+0x6f/0x80 [spl]
[  173.583827]  ? __thread_exit+0x20/0x20 [spl]
[  173.583828]  kthread+0x11b/0x140
[  173.583829]  ? __kthread_bind_mask+0x60/0x60
[  173.583831]  ret_from_fork+0x22/0x30
schreiberstein commented 2 years ago

I experience the same issue on ZFS 2.1.4 on Debian Bullseye amd64 and on another TrueNAS system. This practically renders my ZFS setup useless because I cannot perform any snapshot based replication and/or backup. Is it something dataset specific that could be fixed? Is there a workaround for this? It's been nearly three months since this has been reported. 😢

cyberpower678 commented 2 years ago

I experience the same issue on ZFS 2.1.4 on Debian Bullseye amd64 and on another TrueNAS system. This practically renders my ZFS setup useless because I cannot perform any snapshot based replication and/or backup. Is it something dataset specific that could be fixed? Is there a workaround for this? It's been nearly three months since this has been reported. 😢

You can definitely work around it, especially if you know the date of the last snapshot that was successfully sent. The idea is to find all files and folders that were created/modified since the last snapshot was sent, back it up, roll the dataset back, and then put the backed up files back on. Create a new incremental snapshot, and send it. It should work.

schreiberstein commented 2 years ago

You can definitely work around it, especially if you know the date of the last snapshot that was successfully sent. The idea is to find all files and folders that were created/modified since the last snapshot was sent, back it up, roll the dataset back, and then put the backed up files back on. Create a new incremental snapshot, and send it. It should work.

Interesting approach! But sadly wouldn't work for me. I sent a fresh snapshot to a brand new system and after the transfer was completed, I wanted to send a newer (final) snapshot of all the data that was changed in the meantime to completely switch over to said new system. (Lots of small files, rsync not viable, indexing would take longer than the permissible weekend of downtime) However, no matter what I do, all incremental sends end in a PANIC. Also tried different snapshots. The original dataset got transferred a long time ago from a different TrueNAS system. The only thing that got changed was switching the encryption from passphrase to keyfile on the previous system (the system which I try to send from now) some time ago. There are dozens of snapshots and I cannot brute-force-style attempt all. My only hope would be to copy (send and receive) the dataset locally and pray that it may fix itself.

I wonder how this is possible. Userspace operations causing a literal kernel panic. I know everybody just focusses on issues / bugs that he experiences himself, but I think this bug is really, really serious. Literally crashed a TrueNAS storage server (reboot!) and causes a panic message in Linux. 😢

rincebrain commented 2 years ago

The original bug appears to be #13699, people seem to be piling in various other things.

I am going to bite my tongue and only say that if you find a way to convince OpenZFS leadership that native encryption bugs are worth more than "I hope someone fixes it", please let me know.

schreiberstein commented 2 years ago

The original bug appears to be #13699, people seem to be piling in various other things.

I am going to bite my tongue and only say that if you find a way to convince OpenZFS leadership that native encryption bugs are worth more than "I hope someone fixes it", please let me know.

I get what you are saying... Do you think I could work around it by sending the dataset decrypted instead? I mean, complete an initial sync decrypted, then perform decrypted incremental send? Is it just the encryption that's causing it?

rincebrain commented 2 years ago

I haven't dug into this bug, so I don't know if the flaw in this code is specific to encryption, but it doesn't seem like trying it can make things worse...

On Fri, Aug 5, 2022 at 4:56 AM Alexander Schreiber @.***> wrote:

The original bug appears to be #13699 https://github.com/openzfs/zfs/issues/13699, people seem to be piling in various other things.

I am going to bite my tongue and only say that if you find a way to convince OpenZFS leadership that native encryption bugs are worth more than "I hope someone fixes it", please let me know.

I get what you are saying... Do you think I could work around it by sending the dataset decrypted instead? I mean, complete an initial sync decrypted, then perform decrypted incremental send? Is it just the encryption that's causing it?

— Reply to this email directly, view it on GitHub https://github.com/openzfs/zfs/issues/13445#issuecomment-1206212235, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABUI7LOUN6B7YDL6GPIWQLVXTJLPANCNFSM5VSUYXLA . You are receiving this because you commented.Message ID: @.***>

schreiberstein commented 2 years ago

I haven't dug into this bug, so I don't know if the flaw in this code is specific to encryption, but it doesn't seem like trying it can make things worse... On Fri, Aug 5, 2022 at 4:56 AM Alexander Schreiber @.> wrote: The original bug appears to be #13699 <#13699>, people seem to be piling in various other things. I am going to bite my tongue and only say that if you find a way to convince OpenZFS leadership that native encryption bugs are worth more than "I hope someone fixes it", please let me know. I get what you are saying... Do you think I could work around it by sending the dataset decrypted instead? I mean, complete an initial sync decrypted, then perform decrypted incremental send? Is it just the encryption that's causing it? — Reply to this email directly, view it on GitHub <#13445 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABUI7LOUN6B7YDL6GPIWQLVXTJLPANCNFSM5VSUYXLA . You are receiving this because you commented.Message ID: @.>

I sent an unencrypted (that is, not raw (-w)) version of the dataset and can confirm that sending incremental snapshots to it works flawlessly... Given that I have sent the dataset raw and encrypted twice and I was unable to send incremental snapshots, I conclude the issue lies with the encryption / sending "raw".

cyberpower678 commented 2 years ago

I haven't dug into this bug, so I don't know if the flaw in this code is specific to encryption, but it doesn't seem like trying it can make things worse... On Fri, Aug 5, 2022 at 4:56 AM Alexander Schreiber @.**> wrote: The original bug appears to be #13699 <#13699>, people seem to be piling in various other things. I am going to bite my tongue and only say that if you find a way to convince OpenZFS leadership that native encryption bugs are worth more than "I hope someone fixes it", please let me know. I get what you are saying... Do you think I could work around it by sending the dataset decrypted instead? I mean, complete an initial sync decrypted, then perform decrypted incremental send? Is it just the encryption that's causing it? — Reply to this email directly, view it on GitHub <#13445 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABUI7LOUN6B7YDL6GPIWQLVXTJLPANCNFSM5VSUYXLA . You are receiving this because you commented.Message ID: @.**>

I sent an unencrypted (that is, not raw (-w)) version of the dataset and can confirm that sending incremental snapshots to it works flawlessly... Given that I have sent the dataset raw and encrypted twice and I was unable to send incremental snapshots, I conclude the issue lies with the encryption / sending "raw".

Actually it’s an issue with receiving encrypted streams. The send function works as expected.

If you downgraded the receiver to 2.0.2 it would probably receive just fine.

grahamperrin commented 1 year ago

The original bug appears to be #13699, people seem to be piling in various other things. …

Tentatively (without linking): might https://github.com/openzfs/zfs/pull/14119 partially fix this issue #13445?

Deny receiving into encrypted datasets if the keys are not loaded …

Postscript: hidden as off-topic, given the response from @AttilaFueloep (thank you).

AttilaFueloep commented 1 year ago

Tentatively (without linking): might https://github.com/openzfs/zfs/pull/14119 partially fix this issue #13445?

I don't think so. #14119 is fixing an issue where receiving non-raw unencrypted incremental sends into an encrypted target without the key loaded where accidentally allowed, causing havoc down the stack.

dobsonj commented 1 year ago

I ran into this issue on FreeBSD 12.2-RELEASE-p14 with zfs-2.0.7-1 (TrueNAS-12.0-U8.1). The source system does a raw send (i.e. zfs send -Rw) as an incremental update with intermediary snapshots (-I). Properties recordsize=128K, compression=lz4, encryption=aes-256-gcm are set on the source dataset, so it sends raw compressed and encrypted blocks to the target system. The target system does not have the encryption key in this case, it should just store the encrypted blocks and never mount it. The target system (running the same release) panics during zfs receive:

panic: VERIFY3(0 == dmu_object_set_blocksize(rwa->os, drro->drr_object, drro->drr_blksz, drro->drr_indblkshift, tx)) failed (0 == 45)

cpuid = 7
time = 1654421418
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe017d8ae880
vpanic() at vpanic+0x17b/frame 0xfffffe017d8ae8d0
spl_panic() at spl_panic+0x3a/frame 0xfffffe017d8ae930
receive_object() at receive_object+0x99f/frame 0xfffffe017d8aea40
receive_writer_thread() at receive_writer_thread+0x115/frame 0xfffffe017d8aeab0
fork_exit() at fork_exit+0x7e/frame 0xfffffe017d8aeaf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe017d8aeaf0
--- trap 0xc, rip = 0x800876caa, rsp = 0x7fffffff7508, rbp = 0x7fffffff7570 ---
KDB: enter: panic
panic.txt06000017714247073652  7237 ustarrootwheelVERIFY3(0 == dmu_object_set_blocksize(rwa->os, drro->drr_object, drro->drr_blksz, drro->drr_indblkshift, tx)) failed (0 == 45)
version.txt0600006314247073652  7544 ustarrootwheelFreeBSD 12.2-RELEASE-p11 75566f060d4(HEAD) TRUENAS

The value returned by dmu_object_set_blocksize() on FreeBSD is EOPNOTSUPP:

     45 EOPNOTSUPP Operation not supported.  The attempted operation is not supported for the type of object referenced.  Usually this occurs when
             a file descriptor refers to a file or socket that cannot support this operation, for example, trying to accept a connection on a
             datagram socket.

On Debian, it's 95 (consistent with the panic in the original description of this issue).

#define EOPNOTSUPP      95      /* Operation not supported on transport endpoint */

It's hitting this VERIFY statement, which is only called when raw send is enabled: https://github.com/openzfs/zfs/blob/b9d98453f9387c413f91d1d9cdb0cba8e04dbd95/module/zfs/dmu_recv.c#L1783 EOPNOTSUPP appears to be coming from dnode_set_blksz: https://github.com/openzfs/zfs/blob/b9d98453f9387c413f91d1d9cdb0cba8e04dbd95/module/zfs/dnode.c#L1814 I'll post more details if I can narrow it down to a simple test case, I don't know yet exactly how to reproduce it in isolation.

TBhatti90 commented 1 year ago

Hi, I just wanted to report, I think I'm having the issue. Has there been any progress in tackling this issue? I'm having trouble figuring out how I'm going to make a backup of my main pool, since replication always fails. I can post the dmesg log I get. dmesg_zfs.txt

cyberpower678 commented 1 year ago

Hi, I just wanted to report, I think I'm having the issue. Has there been any progress in tackling this issue? I'm having trouble figuring out how I'm going to make a backup of my main pool, since replication always fails. I can post the dmesg log I get. dmesg_zfs.txt

Are you sending an incremental snapshot, or the initial snapshot?

If the former, identify which files were changed since the last snapshot, copy them out of the dataset/pool, delete them from the pool, put them back, delete the newest snapshot, and make a new one. Replication should work now.

If the later, create a new dataset of the dataset failing to replicate and copy all of the files from the old set to the new one. Delete old dataset once data is copied.

TBhatti90 commented 1 year ago

Hold on, I'm confused about your former option. That's the problem. It's an incremental snapshot, a series of them actually. "I copy the diff files from the newest snapshot, delete them from the newest snapshot, put them back in the newest snapshost, delete the newest snapshot, make a new snapshot". This is weird. Why is it like this? This isn't how sending and replication is supposed to behave. What's going on? I'm using sanoid for my snapshot management, so it creates automatic snapshots at a set time frame, like daily, or monthly. Should I delete every snapshot, except the initial one, that's the same on the server and the backup?

cyberpower678 commented 1 year ago

Hold on, I'm confused about your former option. That's the problem. It's an incremental snapshot, a series of them actually. "I copy the diff files from the newest snapshot, delete them from the newest snapshot, put them back in the newest snapshost, delete the newest snapshot, make a new snapshot". This is weird. Why is it like this? This isn't how sending and replication is supposed to behave. What's going on? I'm using sanoid for my snapshot management, so it creates automatic snapshots at a set time frame, like daily, or monthly. Should I delete every snapshot, except the initial one, that's the same on the server and the backup?

I’m venturing a guess your affected dataset is an older dataset from an older ZFS version. On newer versions of ZFS, some dataset metadata left behind by older versions don’t seem to play nice with the new versions upon sending and receiving them. The idea here is to go back as many snapshots as needed, to get to the point where the last snapshot sent was still working on the backup. Make a note of every file that’s been modified, or created on the dataset since the last working snapshot, and copy them out of the pool. Roll back the source dataset to the last snapshot that sent successfully to the backup. Do the same for the destination. Delete all the files on source that you copied out, that still existed at that point. Copy files back into the dataset, and create a new snapshot. Sending the snapshot should be successful this time around. I’ve discovered this to be an issue that eventually goes away on its own. Just deal with it a couple of times, and it will likely go away for you too.

TBhatti90 commented 1 year ago

I’m venturing a guess your affected dataset is an older dataset from an older ZFS version. On newer versions of ZFS, some dataset metadata left behind by older versions don’t seem to play nice with the new versions upon sending and receiving them. The idea here is to go back as many snapshots as needed, to get to the point where the last snapshot sent was still working on the backup. Make a note of every file that’s been modified, or created on the dataset since the last working snapshot, and copy them out of the pool. Roll back the source dataset to the last snapshot that sent successfully to the backup. Do the same for the destination. Delete all the files on source that you copied out, that still existed at that point. Copy files back into the dataset, and create a new snapshot. Sending the snapshot should be successful this time around. I’ve discovered this to be an issue that eventually goes away on its own. Just deal with it a couple of times, and it will likely go away for you too.

So is there a way to detect which version of zfs the snapshot is from? Now, I don't mind losing the old snapshots. I'm ok with only having relatively recent snapshots. But I'm on Debian Bullseye, so my zfs version shouldn't have changed. Think it's 2.0.3.

The pool itself, appears to be from December 2021 on truenas core, freebsd. I'm not sure what version of zfs they were using back then.

If it helps, here's my syncoid script

!/bin/sh

FILESYSTEMS="Data" LOG=/var/log/syncoid.log for i in $FILESYSTEMS; do echo "--------------- date - syncing $i -------------" >> $LOG 2>&1 /usr/sbin/syncoid -r --exclude=Data/Encrypted/Media --exclude=Data/Unencrypted/Media/Collections --exclude="Data/Unencrypted/Media/TV Shows" --exclude=Data/Unencrypted/Media/Movies --exclude=Data/Unencrypted/Media/Music --exclude=Data/Unencrypted/Media/Temp --exclude=Data/Encrypted/Users/USER/Backups/Desktop --exclude=Data/Encrypted/Users/USER/Backups/Laptop --sendoptions="wp" --skip-parent --no-sync-snap $i root@backup:Backup/$i >> $LOG 2>&1 echo "--------------- date - done $i -------------" >> $LOG 2>&1 done

rincebrain commented 1 year ago

I don't really think your description of the problem, or how to work around it, is accurate, @cyberpower678.

More precisely, if someone who can reproduce this reliably wants to add a few debug prints the next time this happens to them, it's probably pretty straightforward to see why it's getting upset? I have a couple of wild guesses, but without a reproducer I can take home with me, I'm going to get to ask whoever to try things and see how it breaks.

e: reading back, no, this is just the same known issue, probably. Go ask Debian to cherrypick #13782 if you want it to work, I think, or run 2.1.7 or newer.

cyberpower678 commented 1 year ago

I don't really think your description of the problem, or how to work around it, is accurate, @cyberpower678.

More precisely, if someone who can reproduce this reliably wants to add a few debug prints the next time this happens to them, it's probably pretty straightforward to see why it's getting upset? I have a couple of wild guesses, but without a reproducer I can take home with me, I'm going to get to ask whoever to try things and see how it breaks.

e: reading back, no, this is just the same known issue, probably. Go ask Debian to cherrypick #13782 if you want it to work, I think, or run 2.1.7 or newer.

It’s accurate for me. When I did this, what ever was causing zfs receive to either corrupt the dataset, or throw a panic was gone and the subsequent receive was successful. After 3 more instances of this happening, and doing my tedious workaround, it stopped happening entirely. It’s been almost a year since the last occurrence.

cyberpower678 commented 1 year ago

I’m venturing a guess your affected dataset is an older dataset from an older ZFS version. On newer versions of ZFS, some dataset metadata left behind by older versions don’t seem to play nice with the new versions upon sending and receiving them. The idea here is to go back as many snapshots as needed, to get to the point where the last snapshot sent was still working on the backup. Make a note of every file that’s been modified, or created on the dataset since the last working snapshot, and copy them out of the pool. Roll back the source dataset to the last snapshot that sent successfully to the backup. Do the same for the destination. Delete all the files on source that you copied out, that still existed at that point. Copy files back into the dataset, and create a new snapshot. Sending the snapshot should be successful this time around. I’ve discovered this to be an issue that eventually goes away on its own. Just deal with it a couple of times, and it will likely go away for you too.

So is there a way to detect which version of zfs the snapshot is from? Now, I don't mind losing the old snapshots. I'm ok with only having relatively recent snapshots. But I'm on Debian Bullseye, so my zfs version shouldn't have changed. Think it's 2.0.3.

The pool itself, appears to be from December 2021 on truenas core, freebsd. I'm not sure what version of zfs they were using back then.

If it helps, here's my syncoid script

!/bin/sh

FILESYSTEMS="Data" LOG=/var/log/syncoid.log for i in $FILESYSTEMS; do echo "--------------- date - syncing $i -------------" >> $LOG 2>&1 /usr/sbin/syncoid -r --exclude=Data/Encrypted/Media --exclude=Data/Unencrypted/Media/Collections --exclude="Data/Unencrypted/Media/TV Shows" --exclude=Data/Unencrypted/Media/Movies --exclude=Data/Unencrypted/Media/Music --exclude=Data/Unencrypted/Media/Temp --exclude=Data/Encrypted/Users/USER/Backups/Desktop --exclude=Data/Encrypted/Users/USER/Backups/Laptop --sendoptions="wp" --skip-parent --no-sync-snap $i root@backup:Backup/$i >> $LOG 2>&1 echo "--------------- date - done $i -------------" >> $LOG 2>&1 done

@rincebrain may have a point that I may not be on point here. I’m laying out to you what worked for me. The general idea is if older snapshots made it to the destination just fine, you keep those. Only delete the snapshots, that are failing to send/receive successfully. Identify what has changed since the last successful snapshot, rollback to the last working snapshot after copying the changed/added files out, and then completely replace said files (delete and copy back) in the dataset, and create a new snapshot based on the last successful one. Try sending that to the destination. If it still panics, then the only option for now is to simply create a new dataset on the source, and move the files to the new dataset, deleting the old one and send the whole thing again. Really annoying, but most effective in the long-run, from my experience.

rincebrain commented 1 year ago

Or you could just apply the patch that fixes the error on the receiver.

cyberpower678 commented 1 year ago

Or you could just apply the patch that fixes the error on the receiver.

That would be a great solution. Too bad I'm not familiar with how ZFS works internally to actually fix it myself. :-( So, I just work around issues until they are fixed, or are no longer issues.

rincebrain commented 1 year ago

I wasn't suggesting you write a fix. I was suggesting you use 2.1.7 or newer, which have this fixed, or get your distro du jour to apply the relevant fix.

TBhatti90 commented 1 year ago

Hey so just want to make sure I'm understanding correctly, 2.1.7 fixes this right? Debian has that in the backports, along with a newer kernel. I can check that out. Would I have to upgrade my pool too?

If there's any way I can get some debug print statements for you guys, I can provide that. The error seems pretty consistent. Always with the one dataset.

cyberpower678 commented 1 year ago

Or you could just apply the patch that fixes the error on the receiver.

That would be a great solution. Too bad I'm not familiar with how ZFS works internally to actually fix it myself. So, I just work around issues until they are fixed, or are no longer issues.

I wasn't suggesting you write a fix. I was suggesting you use 2.1.7 or newer, which have this fixed, or get your distro du jour to apply the relevant fix.

Oh. Whoops. Now that I reread your initial comment, that does make more sense. lol That being said, I haven't dealt with this issue since I initially opened it here. My encrypted incremental have been replicating without issue.

rincebrain commented 1 year ago

You don't need to pull a newer kernel to use the newer version of ZFS, you can just pull the zfs packages from backports.

You also shouldn't need to run "zpool upgrade" or otherwise tweak any feature flags.

My understanding is that redoing the snapshot, essentially, would just reroll the dice on whether this triggered, because if I understand the bug correctly, in certain cases it wasn't setting that it used large blocks before attempting to make a large block, so at which point specific blocks got included in the send stream (plus variation in your local system) could affect whether this broke for you or not. So I would speculate it's conceivably possible for someone to write a zstream command that reordered when various things were in the send stream so this didn't come up? I'm just spitballing though.

cyberpower678 commented 1 year ago

Hey so just want to make sure I'm understanding correctly, 2.1.7 fixes this right? Debian has that in the backports, along with a newer kernel. I can check that out. Would I have to upgrade my pool too?

If there's any way I can get some debug print statements for you guys, I can provide that. The error seems pretty consistent. Always with the one dataset.

Pool upgrades shouldn't be needed, but it shouldn't hurt. Though I'm not sure about upgrading ZFS in your particular installation since it's TrueNAS.

The latest TrueNAS SCALE is on 2.1.6

zfs --version

zfs-2.1.6-1 zfs-kmod-2.1.6-1

TBhatti90 commented 1 year ago

Just wanted to report the same issue happened with 2.1.7 with Debian 6.0.0. This is the same on the source and destination. The source and destination have an initial snapshot from 12/21/21. The source has many snapshots after that, as it's the primary server. The source was sending a snapshot from 1/30/23, and that's when the kernel panic happened on the destination.

So you want me to delete the new snapshots and paste the changed files in the old snapshot? That sounds difficult.

[23638.869168] VERIFY3(0 == dmu_object_set_blocksize(rwa->os, drro->drr_object, drro->drr_blksz, drro->drr_indblkshift, tx)) failed (0 == 95) [23638.869231] PANIC at dmu_recv.c:1784:receive_object() [23638.869255] Showing stack for process 21420 [23638.869256] CPU: 0 PID: 21420 Comm: receive_writer Tainted: P OE 6.0.0-0.deb11.6-amd64 #1 Debian 6.0.12-1~bpo11+1 [23638.869259] Hardware name: ASUS All Series/Z97I-PLUS, BIOS 2704 02/18/2016 [23638.869260] Call Trace: [23638.869261] [23638.869263] dump_stack_lvl+0x45/0x5e [23638.869270] spl_panic+0xd1/0xe9 [spl] [23638.869279] ? _raw_spin_unlock+0x15/0x30 [23638.869282] ? dnode_hold_impl+0x4eb/0xee0 [zfs] [23638.869355] ? preempt_count_add+0x70/0xa0 [23638.869358] ? preempt_count_add+0x70/0xa0 [23638.869360] ? _raw_spin_lock+0x13/0x40 [23638.869362] ? dnode_rele_and_unlock+0x59/0xf0 [zfs] [23638.869406] ? _raw_spin_unlock+0x15/0x30 [23638.869408] receive_object+0xc30/0xca0 [zfs] [23638.869452] ? dnode_rele_and_unlock+0x59/0xf0 [zfs] [23638.869515] ? preempt_count_add+0x70/0xa0 [23638.869517] receive_writer_thread+0x1cc/0xad0 [zfs] [23638.869559] ? set_next_task_fair+0x2d/0xd0 [23638.869563] ? _raw_spin_unlock_irqrestore+0x23/0x40 [23638.869565] ? receive_process_write_record+0x1a0/0x1a0 [zfs] [23638.869606] ? __thread_exit+0x20/0x20 [spl] [23638.869621] ? thread_generic_wrapper+0x59/0x70 [spl] [23638.869626] thread_generic_wrapper+0x59/0x70 [spl] [23638.869631] kthread+0xe7/0x110 [23638.869633] ? kthread_complete_and_exit+0x20/0x20 [23638.869635] ret_from_fork+0x22/0x30 [23638.869639]

rincebrain commented 1 year ago

Neat. Could you share precisely what zfs version reports on the receiver? I want to go see what's in your precise version...

e: My next suggestion if 2.1.7 didn't resolve it for you would be to try 2.1.9, since 2.1.8 includes f806306ce07e0b0bfe994159ec1c75a2c3281d88 which further revised the code that c8d2ab05e1e6bf21185723fabdebbe5bb3374381 touched.

TBhatti90 commented 1 year ago

Neat. Could you share precisely what zfs version reports on the receiver? I want to go see what's in your precise version...

e: My next suggestion if 2.1.7 didn't resolve it for you would be to try 2.1.9, since 2.1.8 includes f806306 which further revised the code that c8d2ab0 touched.

Hey, So on both the source and receiver this is the zfs version:

zfs-2.1.7-1~bpo11+1 zfs-kmod-2.1.7-1~bpo11+1

I'm not sure if Debian is going to include the later versions. Hopefully they do. Didn't that stack trace provide more information? I do have the blocksize set to greater than or equal to 1M on all my datasets. Thing is, all the snapshots, from when I first created the pool, should have had that block size.

rincebrain commented 1 year ago

The bug that first patch, at least, wants to fix, is that it wasn't correctly setting the "we can use large blocks here" flag on the dataset in a timely fashion, so then attempting to create a large block on it would, naturally, fail.

I would expect Debian to pull in 2.1.9 eventually, though I can't make any promises on what timetable.

mcflis commented 1 year ago

I've ran into a similar issue replicating from Truenas Scale to FreeBSD 12.2-Release-p3:\ Panic String: Solaris(panic): zfs: adding existent segment to range tree (offset=276dff5000 size=1000)

Out of nowhere, I had a presumably broken snapshot of the replicated dataset on the FreeBSD side. I've been able to resolve the issue by destroying all snapshots of the dataset except for the broken one and by running:

zfs rollback <dataset@broken-snap> # <- rollback to the broken snapshot was the key for me
zfs destroy -r <dataset>

Afterwards, I've just issued another replication run and Truenas was happily replicating my snapshots to FreeBSD including the previously destroyed dataset.

I know this github issue is about Linux but maybe it can help someone struggling for days trying to get their replication fixed without the possibility to update to zfs 2.1.9.

Note: Before the rollback, I've upgraded the os to FreeBSD 13.1-RELEASE-p5 and ran zpool upgrade <pool>

A little more info Truenas version: `TrueNAS-SCALE-22.12.0` Truenas zfs version: ```sh zfs-2.1.6-1 zfs-kmod-2.1.6-1 ``` FreeBSD zfs version: ```sh zfs-2.1.4-FreeBSD_g52bad4f23 zfs-kmod-2.1.4-FreeBSD_g52bad4f23 ``` FreeBSD crash info: ```sh # cat /var/crash/info.last Dump header from device: /dev/vtbd0s1b Architecture: amd64 Architecture Version: 2 Dump Length: 416559104 Blocksize: 512 Compression: none Dumptime: 2023-03-08 21:40:01 +0100 Hostname: REDACTED Magic: FreeBSD Kernel Dump Version String: FreeBSD 13.1-RELEASE-p3 GENERIC Panic String: Solaris(panic): zfs: adding existent segment to range tree (offset=276dff5000 size=1000) Dump Parity: 3924248330 Bounds: 0 Dump Status: good ```
rincebrain commented 1 year ago

Since FreeBSD 13 is running OpenZFS in the tree now, seems perfectly germane to mention.

Did you try just the rollback and then receive again, out of curiosity, before destroying the rest?

mcflis commented 1 year ago

Ok cool, well I've destroyed it first before receiving, just to make sure the replication for this particular dataset starts from scratch.

mcflis commented 1 year ago

Did you try just the rollback and then receive again, out of curiosity, before destroying the rest?

I've ran into the same issue once more. This time I only ran zfs rollback -r <dataset@broken-snap> without destroying the dataset afterwards: replication succeeded.

To find the broken snapshot, I've used https://github.com/bahamas10/zfs-prune-snapshots and started from oldest to newest using the -v option to print each action. It would delete snapshots until it hit the broken one and simply become unresponsive (over ssh)

pvelati commented 1 year ago

Having the same issue replicating a dataset from TrueNAS Scale to PBS.

System information SENDER

Type Version/Name
Distribution Name TrueNAS Scale (Debian 11)
Distribution Version 22.12.2
Kernel Version 5.15.79+truenas
Architecture amd64
OpenZFS Version zfs-2.1.9-1 zfs-kmod-2.1.9-1

System information RECEIVER

Type Version/Name
Distribution Name Proxmox Backup Server (Debian 11)
Distribution Version 2.4-2
Kernel Version 6.1.15-1-pve
Architecture amd64
OpenZFS Version zfs-2.1.11-pve1 zfs-kmod-2.1.9-pve1

Describe the problem you're observing

During an incremental receive, ZFS caused a panic. no system hungup. only the dataset is busy.

Describe how to reproduce the problem

When I need to replicate a new snapshot with big change (~100 GB), but not always happening. Can't really replicate it.

Include any warning/errors/backtraces from the system logs

[  404.517496] VERIFY3(0 == dmu_object_set_blocksize(rwa->os, drro->drr_object, drro->drr_blksz, drro->drr_indblkshift, tx)) failed (0 == 95)
[  404.517510] PANIC at dmu_recv.c:1784:receive_object()
[  404.517514] Showing stack for process 3250
[  404.517518] CPU: 2 PID: 3250 Comm: receive_writer Tainted: P           O       6.1.15-1-pve #1
[  404.517524] Hardware name: retsamarret 000-F2423-FBA002-2000-N/Default string, BIOS 5.19 06/24/2022
[  404.517527] Call Trace:
[  404.517531]  <TASK>
[  404.517536]  dump_stack_lvl+0x49/0x63
[  404.517547]  dump_stack+0x10/0x16
[  404.517553]  spl_dumpstack+0x29/0x2f [spl]
[  404.517570]  spl_panic+0xd1/0xe9 [spl]
[  404.517586]  ? dnode_rele_and_unlock+0x68/0x100 [zfs]
[  404.517711]  ? dmu_object_set_blocksize+0x64/0x90 [zfs]
[  404.517811]  receive_object+0xcf5/0xdb0 [zfs]
[  404.517913]  ? dnode_rele+0x3d/0x50 [zfs]
[  404.518015]  ? flush_write_batch+0x1cb/0x5a0 [zfs]
[  404.518117]  receive_writer_thread+0x1cc/0xb50 [zfs]
[  404.518217]  ? set_next_task_fair+0x70/0xa0
[  404.518224]  ? raw_spin_rq_unlock+0x10/0x40
[  404.518229]  ? set_user_nice.part.0+0x167/0x430
[  404.518235]  ? receive_process_write_record+0x190/0x190 [zfs]
[  404.518334]  ? receive_process_write_record+0x190/0x190 [zfs]
[  404.518435]  ? __thread_exit+0x20/0x20 [spl]
[  404.518450]  thread_generic_wrapper+0x61/0x80 [spl]
[  404.518463]  ? thread_generic_wrapper+0x61/0x80 [spl]
[  404.518477]  kthread+0xee/0x120
[  404.518482]  ? kthread_complete_and_exit+0x20/0x20
[  404.518486]  ret_from_fork+0x1f/0x30
[  404.518493]  </TASK>

Message from syslogd@pbs at May 28 19:17:37 ...
kernel:[  404.517496] VERIFY3(0 == dmu_object_set_blocksize(rwa->os, drro->drr_object, drro->drr_blksz, drro->drr_indblkshift, tx)) failed (0 == 95)

Message from syslogd@pbs at May 28 19:17:37 ...
kernel:[  404.517510] PANIC at dmu_recv.c:1784:receive_object()
sfatula commented 1 year ago

Openzfs 2.1.11 both sender (truenas) and receiver (debian) same panic as OP. So, I have to scrap using zfs replication as we cannot store unencrypted and it's too much effort to send all the data again.

amotin commented 1 year ago

@sfatula and others, can you try this patch: https://github.com/openzfs/zfs/pull/15039 ? It should fix at least some cases.

cyberpower678 commented 1 year ago

I would love to, but I worked around the issue by reloading the changed data and recreating a snapshot of it and sending again, which worked around the panic. I haven’t had a panic lately like this, but rather a high amount regarding range trees that I recently fixed/worked around too.

cyberpower678 commented 1 year ago

So right now, my systems are running fine without any panics or ZFS warnings.

sfatula commented 1 year ago

@sfatula and others, can you try this patch: #15039 ? It should fix at least some cases.

I should be able to try that sometime this week and report back, never built zfs but doesn't seem too bad on Debian. I am fortunate in that zfs on our backup machine is only used for backups which is not working now anyway, so, doesn't matter if zfs doesn't run if I mess something up.

sfatula commented 1 year ago

I applied said patch to current master and first wanted to rollback as I did previously to have the same conditions, I typed:

zfs rollback -R Backup/Truenas/tank/Data@backup-20230626033001

And got:

kernel:[ 252.381901] PANIC: zfs: adding existent segment to range tree (offset=1e24c8c000 size=1000)

Before the patch, I could in fact rollback. zfs -V shows (if this is correct):

zfs-2.2.99-1 zfs-kmod-2.2.99-1

Now I also cannot even zpool import the pool anymore as that panics with same message. Heck, can't even boot as on boot I get the same message.

amotin commented 1 year ago

@sfatula My patch is supposed to affects only receive. I can't imagine how it can affect the rollback. especially in a way that would mess up the pool space map. I wonder if you have more than one problem, like bad RAM. Unfortunately there is no solution to already corrupted space maps other than read-only pool import and data evacuation.

sfatula commented 1 year ago

I restored 2.1.11 and machine boots fine, pool works fine. Obviously can't zfs receive still, but it does work. Not sure what the issue might be then. Any other ideas?

cyberpower678 commented 1 year ago

I applied said patch to current master and first wanted to rollback as I did previously to have the same conditions, I typed:

zfs rollback -R Backup/Truenas/tank/Data@backup-20230626033001

And got:

kernel:[ 252.381901] PANIC: zfs: adding existent segment to range tree (offset=1e24c8c000 size=1000)

Before the patch, I could in fact rollback. zfs -V shows (if this is correct):

zfs-2.2.99-1 zfs-kmod-2.2.99-1

Now I also cannot even zpool import the pool anymore as that panics with same message. Heck, can't even boot as on boot I get the same message.

That’s a bug that exists for many including myself. You can set zfs tunables to turn those panics to warnings. The space map isn’t corrupted, it’s being corrupted by newer versions of ZFS on older datasets.

@sfatula My patch is supposed to affects only receive. I can't imagine how it can affect the rollback. especially in a way that would mess up the pool space map. I wonder if you have more than one problem, like bad RAM. Unfortunately there is no solution to already corrupted space maps other than read-only pool import and data evacuation.

It’s not related to your patch but rather something else in newer ZFS versions corrupting older ZFS datasets. This happened to me and I was able to recover from it. It tends to happen with datasets containing large amounts of files.

rincebrain commented 1 year ago

Do you have any pointers to when this happened to you, and what you did? "Newer versions eat my data" is a pretty serious statement.

sfatula commented 1 year ago

I can't find any evidence it's zfs 2.2, but, I did a zdb -b and nothing was detected, no leaks, etc. were reported. Would it possibly make sense to try and apply the patch to a zfs 2.1 version? Maybe I could retry then.

zdb -AAA -b Backup

Traversing all blocks to verify nothing leaked ...

loading concrete vdev 0, metaslab 245 of 246 ... 1.08T completed ( 611MB/s) estimated time remaining: 0hr 00min 04sec
No leaks (block sum matches space maps exactly)

bp count:               5157941
ganged count:                 0
bp logical:       1239224202240      avg: 240255
bp physical:      1169053898240      avg: 226651     compression:   1.06
bp allocated:     1185692446720      avg: 229877     compression:   1.05
bp deduped:                   0    ref>1:      0   deduplication:   1.00
Normal class:     1186744664064     used: 56.39%
Embedded log class              0     used:  0.00%

additional, non-pointer bps of type 0:        132
Dittoed blocks on same vdev: 1836315
sfatula commented 1 year ago

I applied the patch to zfs 2.1.12 (manually of course) and I was able to get the send/recv to work, whereas, it repeatedly failed previous. Same snapshots, so, it would appear that 15039 had a positive effect for me! Thanks @amotin

sfatula commented 1 year ago

@amotin I hope you can gt the patch in a future version of Truenas Scale, I now just got the exact same error replicating to drives attached to the same machine, so, source and destination encrypted but same Scale machine. So, my current Scale backup strategy not working and no good way to apply patch to Scale zfs.