Closed digitalsignalperson closed 2 years ago
Did some further testing with this VM as configured in my last post.
- with unpatched zfs-dkms, use the repro script to get to failure to mount dataset and with error in zpool status
- export testpool,
sudo pacman -R zfs-dkms
,rmmod zfs
and rebuild the package with both the patches- modprobe zfs, import testpool, mount testpool/test-source and testpool/test-dest successfully
- still see zpool status error, but goes away after two scrubs
so using both patches (untested with just the first) in this scenario it seems to recover the data and lets you mount the previously unmountable dataset. Too bad it didn't work for @marker5a's recovery... maybe it's worth trying again based on it working here
First off, happy new year! I was away and unable to work on this too much, but did make some progress. Before going into that though, I'm using vagrant but I've sort of cobbled a mess together on the VM side, so I need to make a point to clean it up and can provide that as my test setup.
With that said, I will definitely give what you posted a run. From my testing, it seems that I suffer from a slightly different problem. I checked out the master zfs branch and ran the earlier scripts you and @putnam provided and was able to apply a patch that effectively ignored the HMAC check (or at least I'm rpetty sure it was) and that was able to successfully mount the src and dest in that testpool that was created.
With excitement, I switched over to trying the patched version on a copy of my corrupted system and I'm still getting the i/o error. I did start adding more dbgmsg output in my branch of code and it does seem to stem from the crypto routines returning a CRYPTO_INVALID_MAC. I'm now trying to pinpoint further a way to bypass that in other spots of the code... again, first goal is to recover code... second goal will be to try and "fix" the issue.
In any case, I'll try your vagrant config and see if that does resolve the i/o issue for short term recovery
Thanks!
Are there not unit tests? Why wouldn't there be a test for this case?
People trust ZFS and expect features in official releases to work. I am now subbed to dozens of showstopper bugs where the consensus is something like "well if a company cares enough maybe it'll be fixed oh well" -- why did these features get merged and released without adequate testing?
i'm also pretty stunned tbh. I expected more from the ZFS team. There ARE regression tests after all, and i think openzfs desgin should be test driven by default, since its so complicated.
This changes my feeling from "openzfs is rockstable" to..."dont use some stuff if you dont want to lose your data".
Edit: Now that i think about it: there probably are tests, but its an issue that only is triggered under certain circumstances. (that couldnt be predicted when writing the tests) Many of these issues seem to be edge cases or weird interactions.
Why wouldn't there be a test for this case?
maybe we can take some ideas from the reproducing scripts in this thread to include as a new unit test in the next PR
@marker5a You could cherry-pick the commit here: gamanakis@c379a3c on top of zfs-2.1.0, zfs-2.1.1, or zfs-2.1.2.
That should resolve your problem. That commit just introduces a flag that marks the useraccounting metadata as invalid when being received, and so it forces their recalculation upon first mounting of the received dataset and avoids the error encountered otherwise.
@gamanakis any thoughts on submitting your patch? In my testing it resolved this issue
@digitalsignalperson I tried to resolve/merge this in #11300. The preferred solution would be to actually reset the responsible dnodes and not rely on a flag, but I failed to make it work. If time permits I will take another look.
Why wouldn't there be a test for this case?
maybe we can take some ideas from the reproducing scripts in this thread to include as a new unit test in the next PR
@marker5a You could cherry-pick the commit here: gamanakis@c379a3c on top of zfs-2.1.0, zfs-2.1.1, or zfs-2.1.2. That should resolve your problem. That commit just introduces a flag that marks the useraccounting metadata as invalid when being received, and so it forces their recalculation upon first mounting of the received dataset and avoids the error encountered otherwise.
@gamanakis any thoughts on submitting your patch? In my testing it resolved this issue
@digitalsignalperson I ran your test sequence on my pool just as one last sanity check and it's still giving I/O errors. That's the bad news. The good news (at least for this issue), is that it appears that my failure may be related to a tangential issue/PR that I could have easily done on an earlier version of code (#12770) .
This would explain why the tests w/ the patched code yield different results than my messed up dataset. Also, it would possibly explain why I'm not seeing this on another machine w/ a similar backup strategy that exhibits zero issues. The troubled dataset I have on my end all stemmed from migrating to a backup pool to reconstruct the original pool (reconfigure disks, vdevs, etc)... and in that process could have very well been using an ashift of 9 on the original pool and switched to 12. Unfortunately, I don't know what ashift was set on the original pool since it's gone now, but that is a theory.
I created a test script w/ the ashift changes to see if the debug info follows a similar path in the code to what is shown on my failing dataset... if that's the case, my test case would no longer be relevant here... and from the looks of it, If I suffered from #12770, then it seems like my data is as good as trash :(
Because we cant always 100% trust zfs, I'm trying to create an intelligent zfs-compare tool that will compare the latest common snapshots in two pools. It will shasum the actual zvols and files, instead of relying on zfs. It will also transfer a remote dataset thats encrypted and only has a local key, so that the encryption key isnt needed remotely. Does anyone want this tool as well?
@psy0rz I'd be curious to see how it works.
Seems to be the same issue as: https://github.com/openzfs/zfs/issues/10523 Might it make sense to somehow merge all those issues into one and get rid of some of the duplicates? There appear to be a number of issues open on this subject and so far no fix since more than a year?
What's the reason not to try and get that approach merged in general?
@rincebrain the person who wrote the encryption (@tcaputi) suggested in PR #11300 that instead of introducing the new flag we should zero out the dnodes holding user accounting on the receiving side.
However, in the absence of a loaded key I fail to see how this is possible. If those dnodes are freed in the receiving side it starts searching for the key. I have pinged Tom again in this regard.
@gamanakis Looking at the pull request there are a couple of things happening that apparently break when you do this. Wouldn't one solution be to just flag the raw received dataset so that this is done on first mount when the key has already been loaded? That way you don't run into missing key issues and evade a failing mount at the same time.
It seems from a cursory glance at the issue and pull request that the data is all well and there and its just some metadata that is corrupted. I guess a perfect solution would be to also correctly send the metadata, if that isn't possible flagging it for cleanup on next mount seems to be the next sensible thing to do.
to just flag the raw received dataset so that this is done on first mount when the key has already been loaded?
Right, that was my initial approach (draft gamanakis@c379a3c).
a perfect solution would be to also correctly send the metadata
This was the latter approach which I have trouble implementing.
I think that this sort of flag is needed in any case and might be cleared by checking whether metadata is valid if in future releases valid metadata can be sent/received.
The problem is that if you look at forwards/backwards compatbility there are already versions with raw send out there that will send wrong metadata. This will have to be handled anyway unless you want to do a major version break for this. So preemptively flagging the received sets and then deciding on first mount whether the data you received was good or needs to be fixed is I think the best way to go forward.
There is not a problem with older versions. The flags are stored in a uint64_t and the placeholder for the new flag defaults to 0, ie inactivated by default.
I think I wasn't clear in what I meant. I meant that while the idea of sending correct metadata is certainly the better longterm solution, its insufficient for solving the issues when sending from an old zfs version to a fixed one. Maybe I also didn't understand whether sending or receiving was actually the culprit here.
In the case where the sending side has to be upgraded to fix the metadata issue we need to add the Dirty flag anyway so that we can check a received dataset because we can't reliably know if its from a new enough version right? I'm unfortunately not very knowledgeable about zfs internals.
I misunderstood what you said, you are right I think. I will do some cleanup and open a new PR introducing the flag,
I wonder how pathological it could be to add a case for "if I fail to decrypt specifically the userobj_accounting metadata, just throw it out", possibly guarded by a tunable. (That, or a zhack command to go forcibly set the "clear on next open" flag.)
It'd be a shame for people to have to throw out pre-patch recvs.
I suspect PR #12981 (update of #11300) resolves this. Anyone interested, feel free to try it out.
Both examples (1 and 2) in OP from @digitalsignalperson and the reproducer from @putnam complete without any errors with #12981 applied.
Awesome thanks @gamanakis! I tested here in my vagrant box and seems to work. Looking forward to using raw sends
Thanks @gamanakis !!
I hope after this, #12720 can get some attention. I am still not able to use raw sends as the generated send stream contains odd/damaged objects that break the receive.
I experienced this last night reformatting a desktop with the expectation that I could zfs send -w
my root, home and other encrypted datasets back to the machine from my nas. Only the ones using encryption (aes-256-gcm) gave "Input/Output error" responses when trying to mount them on the local machine after receiving. Unfortunately I didn't have the opportunity to try #12981 and just rsync'd everything to new encrypted datasets made locally overnight.
It's fun. The nas that these encrypted datasets were sent to could mount them just fine given the nature of the bug. But so could a laptop which I zfs send -w
'd them to as well from that nas. The laptop was running openzfs version v2.1.1 on Archlinux kernel version 5.15.5. It is only the desktop which also received the encrypted datasets that could not mount them, experiencing the issue described here with a new zpool which was running the same kernel version and was downgraded to openzfs 2.1.1 to match the laptop who was successful at mounting them.
At this point my first guess is that the version of zfs your zpool were created on plays a part. But I'll have some more fun playing with it in a VM today now that I'm back online.
At least with the nas mounting the data, I was able to rsync it to the desktop as a one-off. I had a thread on reddit/zfs here but for now I've worked around the issue for myself.
Thank you for the PR @gamanakis.
Just adding my own tests which are consistent with above comments when using ashift=9 and ashift=12 where using 12 causes the problem.
The same archlinux usb stick was able to zfs recv the "broken encrypted dataset" and mount it perfectly fine using default zpool settings. It was only when I used ashift=12 in the zpool creation that the "Input/Output error" issue became apparent.
Testing conditions:
Linux 5.15.5 and zfs 2.1.1, then zfs 2.1.2
Then I did those tests again but for step 3 I included -o ashift=12
and got the IO error when trying to mount the received encrypted dataset.
Setting ashift=12 on zfs 2.1.1 and 2.1.2 causes the issue for me in a VM where /dev/vda was presented as a device using 512b sectors to the VM. Seems consistent enough. My laptop was able to read my desktop's encrypted root dataset because it's zpool still used ashift=9 like the encrypted root dataset it received and could mount.
In my tests it also happens with ashift=9 (default, checked with zdb) when raw sending back to the originating pool.
In my tests it also happens with ashift=9 (default, checked with zdb) when raw sending back to the originating pool.
I am curious, was your pool originally ashift=12 when it sent the snapshot away initially and is receiving it back as ashift=9 to experience the issue?
Without PR 12981 I cannot raw receive in the originating pool, regardless of the ashift, it throws an Input/Output error when mounting.
Let me try your case with 12981 applied.
Ok, this seems to be a different issue. Raw sending from pool1/encrypted with ashift=9 to pool2/encrypted with ashift=12 results to failure when mounting pool2/encrypted (Input/Output error).
I think you should open a new issue. I am not sure raw sending between pools with different ashift is possible, will take a look.
I opened #13067 for this matter, did some debugging there too.
System information
Describe the problem you're observing
I am able to send raw encrypted snapshots (incremental and replication streams) back and forth between file systems a limited number of times before getting a
cannot mount 'rpool/mydataset': Input/output error
and errors in zpool status.I have tried many sequences of sends/receives with raw encrypted snapshots, sometimes I can pass back and forth only 1 time, others more. Below I will share two repeatable examples.
This seems like manifestation of the issue in "Raw send on encrypted datasets does not work when copying snapshots back #10523" which was previously resolved.
Describe how to reproduce the problem
Example 1 - fails on first send back
Example 2 - more convoluted, but fails after a few back and forth
At this point the output of
zpool status -v
includesIf I rollback the last snapshot in question, then scrub once
status still shows
but if I scrub a second time
end up with
and if I repeat the last operation in question, it will get the same IO error again.
The steps are repeatable for me. I don't know if every step matters (e.g. extraneous load-key when I don't mount). I also have some other examples that fail at different points, but I figured these were simple enough to share.