Sometimes raw send on encrypted datasets does not work when copying snapshots back

digitalsignalperson commented 3 years ago

System information

Type	Version/Name
Distribution Name	Arch Linux
Distribution Version	rolling
Kernel Version	5.14.8-arch1-1
Architecture	x86_64
OpenZFS Version	zfs-2.1.1-1

Describe the problem you're observing

I am able to send raw encrypted snapshots (incremental and replication streams) back and forth between file systems a limited number of times before getting a cannot mount 'rpool/mydataset': Input/output error and errors in zpool status.

I have tried many sequences of sends/receives with raw encrypted snapshots, sometimes I can pass back and forth only 1 time, others more. Below I will share two repeatable examples.

This seems like manifestation of the issue in "Raw send on encrypted datasets does not work when copying snapshots back #10523" which was previously resolved.

Describe how to reproduce the problem

Example 1 - fails on first send back

zfs create rpool/test_000 -o encryption=on -o keyformat=passphrase

# create some data and snapshots
touch /mnt/test_000/1.txt
zfs snapshot rpool/test_000@1
touch /mnt/test_000/2.txt
zfs umount rpool/test_000
zfs snapshot rpool/test_000@2

# send to a new encryption root
zfs send -Rw rpool/test_000@2 | zfs receive -u rpool/test_001

# modify data, snapshot, and send back
zfs mount -l rpool/test_001
touch /mnt/test_001/3.txt
zfs umount rpool/test_001
zfs snapshot rpool/test_001@3
zfs send -i @2 -w rpool/test_001@3 | zfs receive -u rpool/test_000

# try to mount
zfs mount rpool/test_000
# cannot mount 'rpool/test_000': Input/output error

Example 2 - more convoluted, but fails after a few back and forth

zfs create rpool/test_002 -o encryption=on -o keyformat=passphrase

# create some data and snapshots
touch /mnt/test_002/1.txt
zfs snapshot rpool/test_002@1
touch /mnt/test_002/2.txt
zfs snapshot rpool/test_002@2
touch /mnt/test_002/3.txt
zfs umount rpool/test_002
zfs snapshot rpool/test_002@3

# send to new encryption root (same steps as Example 1 so far)
zfs send -Rw rpool/test_002@3 | zfs recv -u rpool/test_003

# send to another new encryption root
zfs load-key rpool/test_003
zfs send -Rw rpool/test_003@3 | zfs receive -u rpool/test_004

# modify data, snapshot, and send back
zfs load-key rpool/test_004
zfs mount rpool/test_004
touch /mnt/test_004/4.txt
zfs snapshot rpool/test_004@4
zfs send -w -i @3 rpool/test_004@4 | zfs receive -u rpool/test_003

# try to mount - success where Example 1 failed and only difference is that extra send in-between
zfs mount rpool/test_003
ls /mnt/test_003

# modify data again and send back
touch /mnt/test_003/5.txt
umount rpool/test_003
zfs snapshot rpool/test_003@5
zfs send -w -i @4 rpool/test_003@5 | zfs receive -u rpool/test_004
ls /mnt/test_004/

# modify data and send back
touch /mnt/test_004/6.txt
zfs snapshot rpool/test_004@6
zfs send -w -i @5 rpool/test_004@6 | zfs receive -u rpool/test_003
zfs mount rpool/test_003
cannot mount 'rpool/test_003': Input/output error

At this point the output of zpool status -v includes

errors: Permanent errors have been detected in the following files:

        rpool/test_000:<0x0>
        rpool/test_003:<0x0>

If I rollback the last snapshot in question, then scrub once

zfs rollback -r rpool/test_000@2
zfs rollback -r rpool/test_003@5
zpool scrub rpool
zpool status -v

status still shows

errors: Permanent errors have been detected in the following files:

        rpool/test_000:<0x0>
        rpool/test_003:<0x0>

but if I scrub a second time

zpool scrub rpool
zpool status -v

end up with

errors: No known data errors

and if I repeat the last operation in question, it will get the same IO error again.

The steps are repeatable for me. I don't know if every step matters (e.g. extraneous load-key when I don't mount). I also have some other examples that fail at different points, but I figured these were simple enough to share.

rincebrain commented 3 years ago

I recommend not using native encryption until it gets a fair bit more polish in the future (I'm only so hopeful).

gamanakis commented 3 years ago

This happens because when sending raw encrypted datasets the userspace accounting is present when it's not expected to be. This leads to the subsequent mount failure due a checksum error when verifying the local mac. I tried unsuccessfully to tackle this in #11300. See also: #10523, #11221, #11294.

Edit: If you have critical data lost due to this case I could help you recover them.

putnam commented 3 years ago

I am able to reproduce this.

At a high level I wanted to send an unencrypted dataset to a new pool with encryption enabled, wipe the old pool, and raw send this encrypted dataset back to a fresh pool. But snapshots end up being sent back the other direction at times since it's on an active system. In this process I discovered this bug for myself.

I've made a repro script here which just makes a file-based pool and puts it into a broken state:

#!/bin/bash
truncate -s 64M /tmp/test.pool
echo "12345678" > /tmp/test.pool.key
zpool create testpool /tmp/test.pool
zfs create -o encryption=on -o keyformat=passphrase -o keylocation=file:///tmp/test.pool.key testpool/test-source
echo "honk" > /testpool/test-source/honk
zfs snapshot testpool/test-source@before
zfs send -w testpool/test-source@before | zfs recv testpool/test-dest

# key is not currently loaded for test-dest; load it to check and confirm files
zfs load-key -L file:///tmp/test.pool.key testpool/test-dest
zfs mount testpool/test-dest
# ls /testpool/test-dest
# honk

# now edit the dataset on test-dest, snapshot it, and send it back
echo "honk2" > /testpool/test-dest/honk2
zfs snapshot testpool/test-dest@after
zfs send -w -I testpool/test-dest@before testpool/test-dest@after | zfs recv testpool/test-source

# both files now exist in test-source; looks good (snapshots match between them too)
# ls /testpool/test-source
# honk honk2

# but as soon as you unmount and unload the key, then reload the key and mount it again...
zfs unmount testpool/test-source
zfs unload-key testpool/test-source
zfs load-key -L file:///tmp/test.pool.key testpool/test-source
zfs mount testpool/test-source
# cannot mount 'testpool/test-source': Input/output error
# zpool status -v testpool will show permanent errors
zpool status -v testpool

echo "to clean up:"
echo " zpool destroy testpool && rm /tmp/test.pool && rm /tmp/test.pool.key"

Worse, I originally poked around testing this on my personal pool and it faulted the pool in such a way that even destroying those affected datasets didn't help me:

# zpool status tank -v
  pool: tank
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A

<snip>

errors: Permanent errors have been detected in the following files:

        <0xec80>:<0x0>

aerusso commented 3 years ago

@putnam Hey! Thanks a ton for getting a local reproducer working!

I, however, cannot get this to work (i.e., bug out) on my test platform: an Intel laptop (but I unfortunately has never managed to reproduce on that device). I don't have time right now, but I will try this on my production machine (which does have the problem).

I (therefore) think there may be a hardware component to this bug/these bugs. In the meantime, can you check this on 0.8.6? (I'm low-key hoping you'll be willing to bisect this.)

digitalsignalperson commented 3 years ago

Ran @putnam's script on my own setup (system info at the top) and it did not result in any errors. The final zfs mount testpool/test-source was successful and no errors on the pool.

I just added a few more sending data back and forth and it made it break for me. @aerusso I suspect if you pass snapshots back and forth a few more times (even if it varies per hardware) it will break eventually. I was thinking it would be easy to also write a bit of a fuzzing script to randomly send raw snapshots back and forth, unmounting, remounting, etc that should be able to generate many fail cases; not sure if that would be of help.

my mods to the script below

#!/bin/bash
truncate -s 64M /tmp/test.pool
echo "12345678" > /tmp/test.pool.key
zpool create testpool /tmp/test.pool
zfs create -o encryption=on -o keyformat=passphrase -o keylocation=file:///tmp/test.pool.key testpool/test-source
echo "honk" > /testpool/test-source/honk
zfs snapshot testpool/test-source@before
zfs send -w testpool/test-source@before | zfs recv testpool/test-dest

# key is not currently loaded for test-dest; load it to check and confirm files
zfs load-key -L file:///tmp/test.pool.key testpool/test-dest
zfs mount testpool/test-dest
# ls /testpool/test-dest
# honk

# now edit the dataset on test-dest, snapshot it, and send it back
echo "honk2" > /testpool/test-dest/honk2
zfs snapshot testpool/test-dest@after
zfs send -w -I testpool/test-dest@before testpool/test-dest@after | zfs recv testpool/test-source

# both files now exist in test-source; looks good (snapshots match between them too)
# ls /testpool/test-source
# honk honk2

# but as soon as you unmount and unload the key, then reload the key and mount it again...
zfs unmount testpool/test-source
zfs unload-key testpool/test-source
zfs load-key -L file:///tmp/test.pool.key testpool/test-source
zfs mount testpool/test-source

# ------------------------------
# Not an issue for me yet - my modifications below

# modify source, snapshot, send to dest
zfs rollback testpool/test-dest@after
touch /testpool/test-source/1
zfs snapshot testpool/test-source@1
zfs send -w -i @after testpool/test-source@1 | zfs recv testpool/test-dest

# modify dest, snapshot, send to source
zfs rollback testpool/test-source@1
touch /testpool/test-dest/2
zfs snapshot testpool/test-dest@2
zfs send -w -i @1 testpool/test-dest@2 | zfs recv testpool/test-source

# Everything looks ok with things still mounted.
# try reloading

zfs unmount testpool/test-source
zfs unload-key testpool/test-source
zfs load-key -L file:///tmp/test.pool.key testpool/test-source
zfs mount testpool/test-source

#cannot mount 'testpool/test-source': Input/output error

@putnam for me it got rid of those errors after two scrubs if you want to try to fix your personal pool

putnam commented 3 years ago

Thanks @aerusso and @digitalsignalperson for the feedback and updates. I wonder what is different between our setups. For anyone running that script please post your kernel and ZFS versions at the time you ran it. (uname -a; cat /sys/module/zfs/version)

My kernel at the time of test: Debian 5.14.0-1-amd64 ZFS version: Debian 2.0.6-1

I did also find someone else wrote up a similar script (https://github.com/openzfs/zfs/issues/11983) to attempt a reliable repro. This bug has been reported in several places and probably needs consolidation. It's also clear some efforts have already been made and maybe the root cause is already well-understood. See https://github.com/openzfs/zfs/pull/11300 which has not been updated in ~4 months.

The situation seems kind of bad. I don't know all the possible use cases where it might occur (probably many) but my situation is:

I want to make an encrypted offsite replica for my pool and use raw sends to avoid loading keys
My current pool is unenecrypted and I want to encrypt it and, as a bonus, rebalance data on my vdevs.
I want to send everything to the new backup pool, promote that backup pool to master temporarily (so it receives writes etc.) then destroy my original pool and recreate it, sending the now-updated contents of the backup pool back to the original pool. Then I demote the backup pool back to being a backup, and go back to using my original pool as the master. Then I offsite the backup pool. I realize this is slightly convoluted, but the reasoning is that the backup pool is a denser configuration that takes up fewer RUs and the offsite is more limited in available space.
The problem is that I end up changing stuff on the 2nd pool and want to send it back, and I want to keep using raw sends. I did a dry-run with a test pool and ran into this bug.

@digitalsignalperson I will do two scrubs (this is a large pool so it'll take ~3 days) and report back if it fixes the pool error. Thanks!

pepsinio commented 3 years ago

@putnam i have same problem. did you find any solution?

aerusso commented 3 years ago

Thanks! The modified version "works" (breaks) reliably on my test platform.

putnam commented 3 years ago

@digitalsignalperson

Confirming that two back-to-back scrubs cleared the corruption error. Not sure the technical reason why it took two scrubs, but glad it's cleared.

For what it's worth, my system is an Epyc 7402P with 128GB of ECC RAM.

digitalsignalperson commented 3 years ago

not sure either about the two scrubs, but saw it in suggested/reported in one or some of the other similar encryption issues

rincebrain commented 3 years ago

ZFS remembers the errors from the last completed scrub too, which is why it takes two with the errors gone for them to go away, AIUI.

bghira commented 2 years ago

this bug has likely existed since the introduction of the encryption feature.

marker5a commented 2 years ago

Having the same issue on this end... had no idea that my encrypted backups were getting hosed until I went to restore some of my datasets from my backup.

I do encrypted send w/ syncoid (--sendoptions="w") to backup to the backup pool. The only problem though is that I tried the double scrub but I'm still getting the same input/output error. Is there any other hope to recover the data from the backup pool?

It sounded like from other comments that you need to first get the error to go away from zpool status, and then do two scrubs... is that correct in my thinking? I'm going to spool up a bunch of scrubs sequentially as a last hail mary, but any other pointers would be useful

rincebrain commented 2 years ago

Usually, one would go "[remove whatever is causing errors]" "[scrub twice]" and then zpool status would no longer list those errors.

marker5a commented 2 years ago

Yeah, that makes sense... the confusing thing is that doing the scrub "cleared the error", at least as far as zfs was concerned. Then doing the two sequential scrubs after that, zfs reports no errors, so I would have thought that trying to mount the dataset after two error-less scrubs would allow me to mount the dataset, but I'm still having issues.

Does clearing the error in this case refer to doing more than an initial scrub to make zfs think the error went away?

Also, side note, is there any plausible way to forensically recover the dataset by manually decrypting it? I unfortunately know too little about under the hood to know if this is even possible... or how one would go about it

rincebrain commented 2 years ago

I believe the reason for the counterintuitive behavior is that the error is coming from trying to decrypt things, which zpool scrub very notably does not do.

From my understanding of the problem based on @gamanakis's patch and replies, I would assume it would be possible to write a patch to just ignore the failing bits and let you extract your data. (The existing reverted fix for this might even do that, I'm not sure.)

marker5a commented 2 years ago

I believe the reason for the counterintuitive behavior is that the error is coming from trying to decrypt things, which zpool scrub very notably does not do.

From my understanding of the problem based on @gamanakis's patch and replies, I would assume it would be possible to write a patch to just ignore the failing bits and let you extract your data. (The existing reverted fix for this might even do that, I'm not sure.)

Ok, well yeah, that does make a bit more sense in terms of scrub being unaware.

I'll see if @gamanakis responds here with any helpful info... also trying to figure out if zdb can be useful in getting the data out without patching zfs

digitalsignalperson commented 2 years ago

I'd be interested to hear any solution. I wouldn't mind starting to use raw encrypted sends for offsite backup if there was a hacky workable recovery method.

gamanakis commented 2 years ago

@marker5a You could cherry-pick the commit here: https://github.com/gamanakis/zfs/commit/c379a3cf20f39a74798eeb048d095eaeb7b415c9 on top of zfs-2.1.0, zfs-2.1.1, or zfs-2.1.2.

That should resolve your problem. That commit just introduces a flag that marks the useraccounting metadata as invalid when being received, and so it forces their recalculation upon first mounting of the received dataset and avoids the error encountered otherwise.

rincebrain commented 2 years ago

What's the reason not to try and get that approach merged in general?

marker5a commented 2 years ago

@marker5a You could cherry-pick the commit here: gamanakis@c379a3c on top of zfs-2.1.0, zfs-2.1.1, or zfs-2.1.2.

That should resolve your problem. That commit just introduces a flag that marks the useraccounting metadata as invalid when being received, and so it forces their recalculation upon first mounting of the received dataset and avoids the error encountered otherwise.

@gamanakis Thanks for speedy reply!!! I was about to go down that route by decided to sit on my hands and wait, lol. I'll give that a try and report back my findings... thanks!

digitalsignalperson commented 2 years ago

The justification for the reversion in https://github.com/openzfs/zfs/commit/6217656da33c0920cb9f213742fd51dd215bc455 was the original fix

could lead to failure mounting encrypted datasets created with intermediate versions of ZFS encryption available in master between major releases.

seems odd, reads like it's picking to break one thing (failure to mount raw encrypted sends in general) over another thing (failure to mount encrypted datasets created in-between releases using git master?). If we stick to releases is there any harm of the original patch?

gamanakis commented 2 years ago

This is a bug that seems to exist since the introduction of native encryption, I don't think anymore that it concerns only in-between releases using git master.

marker5a commented 2 years ago

@marker5a You could cherry-pick the commit here: gamanakis@c379a3c on top of zfs-2.1.0, zfs-2.1.1, or zfs-2.1.2.

That should resolve your problem. That commit just introduces a flag that marks the useraccounting metadata as invalid when being received, and so it forces their recalculation upon first mounting of the received dataset and avoids the error encountered otherwise.

Still no dice unfortunately.

The process was as follows:

1.) Created fresh VM and compiled/installed ZFS (Arch 5.10 lts w/ v2.1.0 and cherry-picked commit) 2.) zfs send/recv from backup pool (exhibiting corruption) to new pool created inside VM (Done so to avoid modifying zfs on machine hosting backup pool) 3.) zfs load-key on VM w/ newly received 4.) zfs mount -a on VM w/ newly received

Still getting Input/Output error. Do you think that ZDB could be a savior in retrieving the data?

gamanakis commented 2 years ago

@marker5a Could you try this:

diff --git a/module/zfs/dsl_crypt.c b/module/zfs/dsl_crypt.c
index 26d4c2fe7..35da63ffc 100644
--- a/module/zfs/dsl_crypt.c
+++ b/module/zfs/dsl_crypt.c
@@ -2701,8 +2701,7 @@ spa_do_crypt_objset_mac_abd(boolean_t generate, spa_t *spa, uint64_t dsobj,
                return (0);
        }

-       if (bcmp(portable_mac, osp->os_portable_mac, ZIO_OBJSET_MAC_LEN) != 0 ||
-           bcmp(local_mac, osp->os_local_mac, ZIO_OBJSET_MAC_LEN) != 0) {
+       if (bcmp(portable_mac, osp->os_portable_mac, ZIO_OBJSET_MAC_LEN) != 0) {
                abd_return_buf(abd, buf, datalen);
                return (SET_ERROR(ECKSUM));
        }

This completely bypasses the check of the local metadata MAC.

marker5a commented 2 years ago

I applied that change on top of the c379a3c commit and still getting the same error... moral at an all time low, lol

gamanakis commented 2 years ago

2.) zfs send/recv from backup pool (exhibiting corruption) to new pool created inside VM

I just noticed you said it fails when received into a new pool. Maybe this is unrelated. The bug I was trying to fix is present when receiving backup data into its original pool (not a new one).

marker5a commented 2 years ago

2.) zfs send/recv from backup pool (exhibiting corruption) to new pool created inside VM

I just noticed you said it fails when received into a new pool. Maybe this is unrelated. The bug I was trying to fix is present when receiving backup data into its original pool (not a new one).

Yeah, the use case is that I have been using raw sends to a backup for some time for all of my datasets, including some encrypted ones. I went to reconfigure the main pool (restructuring) and thus blew that pool away after verifying that the backup was good, although not thoroughly checking all datasets obviously :/

Now, my new pool is reconstructed with the exception of two encrypted data sets which is what I am now trying to recover. When I sent them to the reconstructed pool and tried mounting, I discovered the corruption and traced it back to the backup pool and then eventually to this ticket.

I hope I haven't been polluting this issue irrelevantly... If this is out of scope, I'll file a new issue and/or go through the mailing list, but my end goal near term is to recover the data... long term to have a workable way of handling sending encrypted data sets using raw sends for offsite backup

digitalsignalperson commented 2 years ago

I think it's in scope of the issue, want to be able to receive the raw sends really on any pool original or otherwise.

marker5a commented 2 years ago

I think it's in scope of the issue, want to be able to receive the raw sends really on any pool original or otherwise.

Yes, the original scope is being able to send/receive... you're definitely right there.

I guess my out of scope comment was more with regards to recovering data from this failure... that's the part that's driving me nuts right now

digitalsignalperson commented 2 years ago

Ah I see, well any info on recovery seems relevant and may help root cause the issue. So far it seems helpful to see what patches are not solving it

marker5a commented 2 years ago

Ah I see, well any info on recovery seems relevant and may help root cause the issue. So far it seems helpful to see what patches are not solving it

Yeah, I've been reading through zdb a little bit to see if there is at least a recovery scheme there. But certainly willing to try anything on the code side of things, even on a temporary basis to recover the data.

This seems like a major issue, I'm not sure why/how this isn't more visible, but I definitely encourage anyone reading this to check their remote backups ASAP... as should be the case in any backup strategy I suppose

digitalsignalperson commented 2 years ago

For sure, and this underscores testing recovery before implementing your backup. But it could be missed in testing especially with the "sometimes" nature of the bug where you might successfully receive a few times.

I remember reading a lot of popular tech news about zfs encryption being released, but haven't seen any publicity about it being broken. Even in products like TrueNAS I would think this would warrant disabling the feature or showing big warning about current caveats.

rincebrain commented 2 years ago

One would think.

The simple answer is that most people don't hit these issues - a few people hit them often, many people don't hit them at all, and there's no bigco running it and hitting them paying someone to dive in and fix it.

putnam commented 2 years ago

Are there not unit tests? Why wouldn't there be a test for this case?

People trust ZFS and expect features in official releases to work. I am now subbed to dozens of showstopper bugs where the consensus is something like "well if a company cares enough maybe it'll be fixed oh well" -- why did these features get merged and released without adequate testing?

digitalsignalperson commented 2 years ago

Have bounties been used in this project at all (search of issues suggests not) or are they effective in cases like this? I'd be happy to throw some money at it, or even for a script or something to recover raw snapshots to an arbitrary pool

jdeluyck commented 2 years ago

I am personally at the beginning of my zfs journey, and being able to send raw streams of encrypted data was one of my primary things I was looking at.

A simple case of sending from a vanilla pool to another resulted in me getting bitten by this 'input/output error' bug.

Is there something I can do to diagnose/find where the culprit is?

marker5a commented 2 years ago

I am personally at the beginning of my zfs journey, and being able to send raw streams of encrypted data was one of my primary things I was looking at.

A simple case of sending from a vanilla pool to another resulted in me getting bitten by this 'input/output error' bug.

Is there something I can do to diagnose/find where the culprit is?

Can you post the steps that got you to where you are? That'd be good to know what specific things cause the failure... I'm actually going back to the beginning of this to try and recreate the issue and see if I can do anything code wise to recover the data... I have low hopes, but it's worth a shot to be able to recover my data

digitalsignalperson commented 2 years ago

There are several reproducing scripts further up in this issue

putnam commented 2 years ago

I am personally at the beginning of my zfs journey, and being able to send raw streams of encrypted data was one of my primary things I was looking at. A simple case of sending from a vanilla pool to another resulted in me getting bitten by this 'input/output error' bug. Is there something I can do to diagnose/find where the culprit is?

Can you post the steps that got you to where you are? That'd be good to know what specific things cause the failure... I'm actually going back to the beginning of this to try and recreate the issue and see if I can do anything code wise to recover the data... I have low hopes, but it's worth a shot to be able to recover my data

The script I posted in this thread (with DSP's modifications, possibly) reliably reproduces the issue.

jdeluyck commented 2 years ago

That's fairly simple ;)

Laptop running debian sid, zfs 2.1.1 Target system. Is a proxmox box that started in life on zfs 0.8.3, but now sits at 2.1.1

Source is my aes256 encrypted home dataset on my laptop. For sending I'm using syncoid with the send parameter -w for raw. Target dataset is created and data is transferred.

Loading the key works, but as soon as I mount the dataset I get an i/o error, which is also in the zpool status.

I haven't tried any of the scrubbing yet, as it takes nearly a day on my entire destination pool.

I'll try to recreate this problem with a smaller vm.

digitalsignalperson commented 2 years ago

Hmm, for those willing to play with VMs and such, maybe we should check if the issue exists on the FreeBSD version? Or has it been confirmed already

marker5a commented 2 years ago

Hmm, for those willing to play with VMs and such, maybe we should check if the issue exists on the FreeBSD version? Or has it been confirmed already

I can probably spool one up easily for FreeBSD... will report my findings

wohali commented 2 years ago

As I mentioned in a different issue, this showed up for me on FreeNAS / TrueNAS, so it definitely occurs in FreeBSD. See #12014 and #11688

digitalsignalperson commented 2 years ago

Dumb recovery idea, but couldn't you just dd the whole remote storage device (with it offline) to your local system over ssh and then import the pool locally? That might at least recover a remote backup that is affected by this. Umm, wait nvm, remembering it can't even be mounted on the other side.

marker5a commented 2 years ago

Dumb recovery idea, but couldn't you just dd the whole remote storage device (with it offline) to your local system over ssh and then import the pool locally? That might at least recover a remote backup that is affected by this

I don't think that would recover an encrypted dataset. I'm doing something similar enough... I'm just placing the disks for the backup pool into the same machine that I am trying to migrate the data to.

A dd would just copy-paste the zpool itself, so any issues with mounting the encrypted-and-corrupted dataset would carry over via dd

marker5a commented 2 years ago

Also, seems like #10019 is relevant

digitalsignalperson commented 2 years ago

The process was as follows:

1.) Created fresh VM and compiled/installed ZFS (Arch 5.10 lts w/ v2.1.0 and cherry-picked commit) 2.) zfs send/recv from backup pool (exhibiting corruption) to new pool created inside VM (Done so to avoid modifying zfs on machine hosting backup pool) 3.) zfs load-key on VM w/ newly received 4.) zfs mount -a on VM w/ newly received

Still getting Input/Output error. Do you think that ZDB could be a savior in retrieving the data?

@marker5a might you have a Vagrantfile for this or modified PKGBUILD file handy? was thinking of poking around. Seems you tried recovering your data, but did you try running the reproducing scripts in this environment? Wonder if going from scratch whether the patch might prevent the problem, but just doesn't solve the case for data from before the patch.

digitalsignalperson commented 2 years ago

Hmm, maybe the patches from @gamanakis do fix the issue? Here's what I tested:

vagrant init archlinux/archlinux

curl -L -O https://aur.archlinux.org/cgit/aur.git/snapshot/zfs-dkms.tar.gz
tar -xvf zfs-dkms.tar.gz
curl -L -O https://aur.archlinux.org/cgit/aur.git/snapshot/zfs-utils.tar.gz
tar -xvf zfs-utils.tar.gz

# edit zfs-dkms PKGBUILD to include @gamanakis two patches in prepare() after the 0001 patch:
    patch -p1 -i ../0003-c379a3cf.patch
    patch --ignore-whitespace -p1 -i ../0004-bypass-check.patch

vagrant up
vagrant ssh
sudo pacman -Syu base-devel linux-headers
reboot
vagrant ssh
cd /vagrant/zfs-utils
makepkg -i --noconfirm --skippgpcheck .
cd /vagrant/zfs-dkms
makepkg -i --noconfirm --skippgpcheck .
reboot
vagrant ssh
sudo modprobe zfs
sudo su
# execute reproducing script from here

from here I executed the steps from this script https://github.com/openzfs/zfs/issues/12594#issuecomment-945197353

I did not encounter any I/O errors. So I tried generating more code for the bash script with this python snippet to do 20 more snapshots back and forth:

i = 2
while i < 20:
    x = f"""# modify source, snapshot, send to dest
zfs rollback testpool/test-dest@{i}
touch /testpool/test-source/{i+1}
zfs snapshot testpool/test-source@{i+1}
zfs send -w -i @{i} testpool/test-source@{i+1} | zfs recv testpool/test-dest

# modify dest, snapshot, send to source
zfs rollback testpool/test-source@{i+1}
touch /testpool/test-dest/{i+2}
zfs snapshot testpool/test-dest@{i+2}
zfs send -w -i @{i+1} testpool/test-dest@{i+2} | zfs recv testpool/test-source

zfs unmount testpool/test-source
zfs unload-key testpool/test-source
zfs load-key -L file:///tmp/test.pool.key testpool/test-source
zfs mount testpool/test-source"""
    i = i + 2
    print('#-----------------------\n')
    print(x)

so this generated a bunch more code for the script, up until the final block of

#-----------------------

# modify source, snapshot, send to dest
zfs rollback testpool/test-dest@18
touch /testpool/test-source/19
zfs snapshot testpool/test-source@19
zfs send -w -i @18 testpool/test-source@19 | zfs recv testpool/test-dest

# modify dest, snapshot, send to source
zfs rollback testpool/test-source@19
touch /testpool/test-dest/20
zfs snapshot testpool/test-dest@20
zfs send -w -i @19 testpool/test-dest@20 | zfs recv testpool/test-source

zfs unmount testpool/test-source
zfs unload-key testpool/test-source
zfs load-key -L file:///tmp/test.pool.key testpool/test-source
zfs mount testpool/test-source

at the end I didn't have any issues:

[root@archlinux vagrant]# zfs list -t snapshot
NAME                          USED  AVAIL     REFER  MOUNTPOINT
testpool/test-dest@before      83K      -     95.5K  -
testpool/test-dest@after       83K      -       99K  -
testpool/test-dest@1           83K      -       99K  -
testpool/test-dest@2           83K      -       99K  -
testpool/test-dest@3           83K      -       99K  -
testpool/test-dest@4           83K      -       99K  -
testpool/test-dest@5           83K      -       99K  -
testpool/test-dest@6           83K      -       99K  -
testpool/test-dest@7           83K      -       99K  -
testpool/test-dest@8           83K      -       99K  -
testpool/test-dest@9           83K      -       99K  -
testpool/test-dest@10          83K      -       99K  -
testpool/test-dest@11          83K      -       99K  -
testpool/test-dest@12          83K      -       99K  -
testpool/test-dest@13          83K      -       99K  -
testpool/test-dest@14          83K      -       99K  -
testpool/test-dest@15          83K      -       99K  -
testpool/test-dest@16          83K      -       99K  -
testpool/test-dest@17          83K      -       99K  -
testpool/test-dest@18          83K      -       99K  -
testpool/test-dest@19          83K      -       99K  -
testpool/test-dest@20           0B      -       99K  -
testpool/test-source@before    83K      -     98.5K  -
testpool/test-source@after     83K      -       99K  -
testpool/test-source@1         83K      -       99K  -
testpool/test-source@2         83K      -       99K  -
testpool/test-source@3         83K      -       99K  -
testpool/test-source@4         83K      -       99K  -
testpool/test-source@5         83K      -       99K  -
testpool/test-source@6         83K      -       99K  -
testpool/test-source@7         83K      -       99K  -
testpool/test-source@8         83K      -       99K  -
testpool/test-source@9         83K      -       99K  -
testpool/test-source@10        83K      -       99K  -
testpool/test-source@11        83K      -       99K  -
testpool/test-source@12        83K      -       99K  -
testpool/test-source@13        83K      -       99K  -
testpool/test-source@14        83K      -       99K  -
testpool/test-source@15        83K      -       99K  -
testpool/test-source@16        83K      -       99K  -
testpool/test-source@17        83K      -       99K  -
testpool/test-source@18        83K      -       99K  -
testpool/test-source@19        83K      -       99K  -
testpool/test-source@20        82K      -       99K  -

[root@archlinux vagrant]# ls /testpool/test-source/
1  10  11  12  13  14  15  16  17  18  19  2  20  3  4  5  6  7  8  9  honk  honk2
[root@archlinux vagrant]# ls /testpool/test-dest/
1  10  11  12  13  14  15  16  17  18  19  2  20  3  4  5  6  7  8  9  honk  honk2

Any thoughts?

Edit: and just to be sure, created an identical vagrantbox except without applying the patches to be sure it indeed still fails with the reproducing script with cannot mount 'testpool/test-source': Input/output error at the expected spot

Edit 2: re-tested with only the first patch and that seems sufficient to prevent the errors

digitalsignalperson commented 2 years ago

Did some further testing with this VM as configured in my last post.

with unpatched zfs-dkms, use the repro script to get to failure to mount dataset and with error in zpool status
export testpool, sudo pacman -R zfs-dkms, rmmod zfs and rebuild the package with both the patches
modprobe zfs, import testpool, mount testpool/test-source and testpool/test-dest successfully
still see zpool status error, but goes away after two scrubs

so using both patches (untested with just the first) in this scenario it seems to recover the data and lets you mount the previously unmountable dataset. Too bad it didn't work for @marker5a's recovery... maybe it's worth trying again based on it working here

openzfs / zfs