ZFS corruption related to snapshots post-2.0.x upgrade

jgoerzen commented 3 years ago

System information

Type	Version/Name
Distribution Name	Debian
Distribution Version	Buster
Linux Kernel	5.10.0-0.bpo.5-amd64
Architecture	amd64
ZFS Version	2.0.3-1~bpo10+1
SPL Version	2.0.3-1~bpo10+1

Describe the problem you're observing

Since upgrading to 2.0.x and enabling crypto, every week or so, I start to have issues with my zfs send/receive-based backups. Upon investigating, I will see output like this:

zpool status -v
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 00:03:37 with 0 errors on Mon May  3 16:58:33 2021
config:

    NAME         STATE     READ WRITE CKSUM
    rpool        ONLINE       0     0     0
      nvme0n1p7  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        <0xeb51>:<0x0>

Of note, the <0xeb51> is sometimes a snapshot name; if I zfs destroy the snapshot, it is replaced by this tag.

Bug #11688 implies that zfs destroy on the snapshot and then a scrub will fix it. For me, it did not. If I run a scrub without rebooting after seeing this kind of zpool status output, I get the following in very short order, and the scrub (and eventually much of the system) hangs:

[393801.328126] VERIFY3(0 == remove_reference(hdr, NULL, tag)) failed (0 == 1)
[393801.328129] PANIC at arc.c:3790:arc_buf_destroy()
[393801.328130] Showing stack for process 363
[393801.328132] CPU: 2 PID: 363 Comm: z_rd_int Tainted: P     U     OE     5.10.0-0.bpo.5-amd64 #1 Debian 5.10.24-1~bpo10+1
[393801.328133] Hardware name: Dell Inc. XPS 15 7590/0VYV0G, BIOS 1.8.1 07/03/2020
[393801.328134] Call Trace:
[393801.328140]  dump_stack+0x6d/0x88
[393801.328149]  spl_panic+0xd3/0xfb [spl]
[393801.328153]  ? __wake_up_common_lock+0x87/0xc0
[393801.328221]  ? zei_add_range+0x130/0x130 [zfs]
[393801.328225]  ? __cv_broadcast+0x26/0x30 [spl]
[393801.328275]  ? zfs_zevent_post+0x238/0x2a0 [zfs]
[393801.328302]  arc_buf_destroy+0xf3/0x100 [zfs]
[393801.328331]  arc_read_done+0x24d/0x490 [zfs]
[393801.328388]  zio_done+0x43d/0x1020 [zfs]
[393801.328445]  ? zio_vdev_io_assess+0x4d/0x240 [zfs]
[393801.328502]  zio_execute+0x90/0xf0 [zfs]
[393801.328508]  taskq_thread+0x2e7/0x530 [spl]
[393801.328512]  ? wake_up_q+0xa0/0xa0
[393801.328569]  ? zio_taskq_member.isra.11.constprop.17+0x60/0x60 [zfs]
[393801.328574]  ? taskq_thread_spawn+0x50/0x50 [spl]
[393801.328576]  kthread+0x116/0x130
[393801.328578]  ? kthread_park+0x80/0x80
[393801.328581]  ret_from_fork+0x22/0x30

However I want to stress that this backtrace is not the original cause of the problem, and it only appears if I do a scrub without first rebooting.

After that panic, the scrub stalled -- and a second error appeared:

zpool status -v
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub in progress since Sat May  8 08:11:07 2021
    152G scanned at 132M/s, 1.63M issued at 1.41K/s, 172G total
    0B repaired, 0.00% done, no estimated completion time
config:

    NAME         STATE     READ WRITE CKSUM
    rpool        ONLINE       0     0     0
      nvme0n1p7  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        <0xeb51>:<0x0>
        rpool/crypt/debian-1/home/jgoerzen/no-backup@[elided]-hourly-2021-05-07_02.17.01--2d:<0x0>

I have found the solution to this issue is to reboot into single-user mode and run a scrub. Sometimes it takes several scrubs, maybe even with some reboots in between, but eventually it will clear up the issue. If I reboot before scrubbing, I do not get the panic or the hung scrub.

I run this same version of ZoL on two other machines, one of which runs this same kernel version. What is unique about this machine?

It is a laptop
It uses ZFS crypto (the others use LUKS)

I made a significant effort to rule out hardware issues, including running several memory tests and the built-in Dell diagnostics. I believe I have rules that out.

Describe how to reproduce the problem

I can't at will. I have to wait for a spell.

Include any warning/errors/backtraces from the system logs

See above

Potentially related bugs

I already mentioned #11688 which seems similar, but a scrub doesn't immediately resolve the issue here
A quite similar backtrace also involving arc_buf_destroy is in #11443. The behavior described there has some parallels to what I observe. I am uncertain from the discussion what that means for this.
In #10697 there are some similar symptoms, but it looks like a different issue to me

cyberpower678 commented 2 years ago

@cyberpower678 I entirely sympathize. I would not be willing to share a pool, even encrypted, either. However, this puts us in a deadlock: only you have access to the reproducer. I'll outline what I'd do if I somehow received your (hopefully not too large) pool image:

I'd load it into a qcow image by dd-ing the source into a qemu-nbd device

I'd create a VM as similar to your setup as possible.

I'd snapshot the qcow2 images (pool and VM root)

I'd mount your pool in the VM and run the reproducer

I'd confirm the bug

I'd reset the VM to the qcow2 snapshot

I would repeat 4-6 with Linux 5.4, and your current ZFS (expect corruption)

Repeat 7, but with ZFS 0.8.6 (expect no corruption)

Repeat 7, but with ZFS compiled from git b8a9041 (0.8 branch merge-base; expect no corruption)

I'd do a git bisect to find the commit on master that started the problem

If you're willing to do this (it would be a lot of recompiling), I can walk you through any steps that are not clear.

The end result of this process would be a specific commit that causes corruption.

The pool is 26TB in size. Not sure if you want to handle something that large. The dataset itself is 1.57TB. However, I will offer an alternative. You and I could arrange a Team Viewer where we can work together on this. You can then remotely try to work on the image clone with me. That way I can supervise the data being handled.

cyberpower678 commented 2 years ago

The problem repeatedly reproduced while sending over WAN. I'm confirming right now if the same happens if I just send it to another local test pool I just set up.

cyberpower678 commented 2 years ago

IMO, git-bisect is the easy part. The loop is basically going to be

make clean

make

make deb (or whatever)

reset the VM/ copy the files to the VM/install

run reproducer

git bisect good or git bisect bad depending on the result

repeat

Setting up the VM and waiting for the copy to complete will be the hard part.

@cyberpower678 Can you confirm if this happens without the raw send? I know there were some bugs associated with raw sending that have had some progress on them.

I don't believe I've had any issues with non-raw sends.

Blackclaws commented 2 years ago

IMO, git-bisect is the easy part. The loop is basically going to be

make clean

make

make deb (or whatever)

reset the VM/ copy the files to the VM/install

run reproducer

git bisect good or git bisect bad depending on the result

repeat

Setting up the VM and waiting for the copy to complete will be the hard part. @cyberpower678 Can you confirm if this happens without the raw send? I know there were some bugs associated with raw sending that have had some progress on them.

I don't believe I've had any issues with non-raw sends.

All my issues have been with non raw sends so far. They might be different underlying issues however.

rincebrain commented 2 years ago

In my experience, many of the known issues with native encryption happen if it's being received encrypted, independent of whether you did send -w or not.

So it wouldn't surprise me if whatever this is can trigger on both too.

On Thu, May 12, 2022 at 11:54 AM Felix Winterhalter < @.***> wrote:

IMO, git-bisect is the easy part. The loop is basically going to be

make clean

make

make deb (or whatever)

reset the VM/ copy the files to the VM/install

run reproducer

git bisect good or git bisect bad depending on the result

repeat

Setting up the VM and waiting for the copy to complete will be the hard part. @cyberpower678 https://github.com/cyberpower678 Can you confirm if this happens without the raw send? I know there were some bugs associated with raw sending that have had some progress on them.

I don't believe I've had any issues with non-raw sends.

All my issues have been with non raw sends so far. They might be different underlying issues however.

— Reply to this email directly, view it on GitHub https://github.com/openzfs/zfs/issues/12014#issuecomment-1125162765, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABUI7K43S3SYRTOZNTXDH3VJUST5ANCNFSM44NAMYKA . You are receiving this because you were mentioned.Message ID: @.***>

cyberpower678 commented 2 years ago

Dataset corruption reproduced even on a locally attached drive. @aerusso would you like to arrange a meeting on Zoom or TeamViewer, or something?

aerusso commented 2 years ago

Do you have the > 26TB extra set up to create the qcow2 images? (You probably could get away with the test pool not having any redundancy, but honestly I don't know). I unfortunately will be pretty busy for about 2 weeks., but you can start the process by getting a VM up and dd-ing the raw partition into the qcow2 image. I do not recommend trying to do this with a zpool checkpoint and zpool restore --- I tried doing that on a live system myself, and the image I copied wound up being corrupt.

cyberpower678 commented 2 years ago

Do you have the > 26TB extra set up to create the qcow2 images? (You probably could get away with the test pool not having any redundancy, but honestly I don't know). I unfortunately will be pretty busy for about 2 weeks., but you can start the process by getting a VM up and dd-ing the raw partition into the qcow2 image. I do not recommend trying to do this with a zpool checkpoint and zpool restore --- I tried doing that on a live system myself, and the image I copied wound up being corrupt.

Sorry, you just left my scope of knowledge on that one. Short answer is no. I’m probably going to need to be given step by step instructions on this one. I’m also going to need to wrangle enough drives together to create a > 26TB volume that we can use to work with.

Maltz42 commented 2 years ago

@cyberpower678 I have five 6TB SATA WD Red (CMR) drives sitting around I'd be willing to loan the the cause, if you're interested. They have some miles on them but were in good working condition when I decommissioned them about a year ago. (A few have some UDMA_CRC_Error_Count SMART errors, but that was from a cable or controller issue, long ago resolved.)

bghira commented 2 years ago

if you create a snapshot and delete all of its content to make it very small, does it still reproduce the issue just sending that one snapshot? it's a long shot but it has worked in the past to create simpler reproducers.

cyberpower678 commented 2 years ago

@cyberpower678 I have five 6TB SATA WD Red (CMR) drives sitting around I'd be willing to loan the the cause, if you're interested. They have some miles on them but were in good working condition when I decommissioned them about a year ago. (A few have some UDMA_CRC_Error_Count SMART errors, but that was from a cable or controller issue, long ago resolved.)

That might actually help. Do you have a few hard drive docking stations lying around too?

cyberpower678 commented 2 years ago

if you create a snapshot and delete all of its content to make it very small, does it still reproduce the issue just sending that one snapshot? it's a long shot but it has worked in the past to create simpler reproducers.

I don't understand this question. A snapshot is an image of the file system at one point in time. Are you asking me to empty out the dataset and make a new snapshot?

cyberpower678 commented 2 years ago

Can someone explain to me how to clone a pool's image, onto a collection of disks? I would like to try to clone it, and strip out the data that is private and intact and leave an image file that can reproduce the issue.

Maltz42 commented 2 years ago

Well, normally I would say to take a snapshot, then do a send/receive to the pool on the other array, but that's kind of a catch 22 here. lol I wonder if you did a non-raw send/receive if that would avoid the corruption, while preserving the reproducibility?

zfsbot commented 2 years ago

i think what they mean is if you clone the snapshot, and remove data from the clone, and then snapshot that clone and try sending just that snapshot as a full send to see if it helps

cyberpower678 commented 2 years ago

Well with one of my more private datasets, I decided to nuke every snapshot of it and create a fresh one. But it's corrupted in the exact same manner. The other one, which is a dataset for my Time Machine backups, is also corrupted and I get a PANIC when trying to delete it. VERIFY3 error.

cyberpower678 commented 2 years ago

@aerusso thanks to some generous donations, I have the raw disk capacity to clone the pool, but how should I go about it? Obviously zfs send will just send a corrupted image over. Any way to truly clone the pool from one collection of disks to another? I then need to try and shrink it down to just the broken datasets.

Maltz42 commented 2 years ago

One way might be to "zfs clone" the problem dataset(s) in-place, then do your paring down on the clone. See if the corruption is still reproducible.

If that doesn't work, using a VM and qcow2 images as @aerusso is even better, but I don't have the expertise to walk you through that. But you can sort of do the same thing without a VM. What is the physical structure of your current array? Is downtime a problem? Do you have a second machine to test with?

1) Merge all the 6TB drives into one massive volume using LVM. (If you can do step 3 with out doing this, performance might be better.) 2) zpool export your pool 3) Use dd to create image files of each physical disk on the 6TB drives. 4) Move the 6TB drives to the test computer and Import and rename the test pool using: "zpool import -d /test_vol/disk1.img -d /test_vol/disk2.img [etc] poolname poolname_test"

Since this "new" test pool will have the same UUID as your production pool, it might be a bad idea to have them both on the same machine at the same time, or to even try to import the test pool on the production machine at all. I'm not sure what impacts there would be from having two pools with the same UUID on the same machine, but I can imagine some real badness happening, especially if you trigger an auto-import, such as during a reboot. But like the VM solution above, this process guarantees that the pool is block-identical to the pool that is causing the problems, so it would give a good chance of being able to recreate it.

cyberpower678 commented 2 years ago

@Maltz42 thank you. I will try to dd each disk first, and yes downtime is a problem, but I should be able to put together a VM and attach all the clones there. My physical structure involved 18 HDDs, 8 of them are mirrors of the other and 2 are spares. There is an 18TB drive mirrored by another 18TB drive. The rest are 4TB drives. I happen to have an extra 18TB lying around that I can use to clone the commissioned drive, and I can use my decommissioned drives, both yours and mine to clone the rest. In theory 8 cloned disks should allow me to import a degraded pool.

cyberpower678 commented 2 years ago

Actually even better, I have two drive cloners. I can temporarily degrade my production pool and take out the drives, and then clone them and stick them back without any down time.

aerusso commented 2 years ago

I started writing up a little tutorial using qemu-nbd, but it was getting long (and I wanted to dry run it).

If you're copying block devices using cp (or dd), make sure you do NOT try to import the pool (either copy) while both copies are attached. Someone more knowledgeable can chime in, but it should not be possible in principle to distinguish the copies from each other, since they are identical. (You might be able to do something with block device names, but I don't have the answer).

I.e., physically disconnect the production pool before you do anything to the test copy to avoid messing things up.

cyberpower678 commented 2 years ago

I started writing up a little tutorial using qemu-nbd, but it was getting long (and I wanted to dry run it).

If you're copying block devices using cp (or dd), make sure you do NOT try to import the pool (either copy) while both copies are attached. Someone more knowledgeable can chime in, but it should not be possible in principle to distinguish the copies from each other, since they are identical. (You might be able to do something with block device names, but I don't have the answer).

I.e., physically disconnect the production pool before you do anything to the test copy to avoid messing things up.

I will be using cloning bays to physically clone the disks. The clones won't be attached to production. What I do need to know is how to migrate the pool from 8 disks onto 1, once I have the cloned pool shrunk down.

cyberpower678 commented 2 years ago

I have taken all of the drive mirrors out of production and began cloning them with disk cloners. This will probably take a day or two.

Meanwhile some observations. The broken datasets can be repaired by moving the files/folders that arrive corrupted off of the source dataset and back on it. The resets the inode and metadata on the dataset. I have successfully sent the other two datasets that were arriving corrupted without further corruption.

I note the impacted files and folders were dated at a certain time point. Presumably the corruption was introduced in an older version of ZFS and wasn't being corrected or handled in later versions. My observed date range of impacted files were created from early 2020 to late 2020.

cyberpower678 commented 2 years ago

Cloning the pool was successful. @aerusso I could use some information on how to consolidate the pool onto a single image file. Right now, it's on 8 cloned disks.

aerusso commented 2 years ago

So, what I would do now is try to reproduce the bug using 0.8.6 and 2.1. Presumably, you'll be able to reproduce it with 2.1. In my case, I could not reproduce with 0.8.6. A bisect can then find the guilty commit.

I feel like I may have misunderstood what your symptoms were, because reading through your comments it appears there may be some kind of corruption. In any case, we can narrow things down a lot if we know if this bug does not reproduce under 0.8.6.

cyberpower678 commented 2 years ago

Unfortunately 2 of my decommissioned drives faulted, and broke the clone. Have to clone the whole thing again. :-(

cyberpower678 commented 2 years ago

So, what I would do now is try to reproduce the bug using 0.8.6 and 2.1. Presumably, you'll be able to reproduce it with 2.1. In my case, I could not reproduce with 0.8.6. A bisect can then find the guilty commit.

I feel like I may have misunderstood what your symptoms were, because reading through your comments it appears there may be some kind of corruption. In any case, we can narrow things down a lot if we know if this bug does not reproduce under 0.8.6.

I neither have the know-how, nor the time to actually do the git bisect. I was hoping you could do that.

cyberpower678 commented 2 years ago

I was able to recover the sandbox pool without issue, so time saved.

As can be seen, I got the pool down to 153 GB in size. Only the broken dataset is in there.

The test pool is healthy. Now I need to get it down to an image I can share with the OpenZFS devs. @aerusso can you help with this?

aerusso commented 2 years ago

Ok, first of all, can you confirm that this reproduces the bug on 2.1?

If it does, can you confirm that it does NOT reproduce the bug on 0.8.6?

cyberpower678 commented 2 years ago

I can confirm it reproduces on 2.1.2, not sure how I go about confirming 0.8.6? I don't think the pool can be downgraded to run on that older version.

aerusso commented 2 years ago

If you can't test your pool on an earlier version of ZFS, we're not going to be able to do a git bisect in the way I've been suggesting.

cyberpower678 commented 2 years ago

If you can't test your pool on an earlier version of ZFS, we're not going to be able to do a git bisect in the way I've been suggesting.

Well I hope you can do something with it. Is there a way you can simply take my image of the pool and you play around with it yourself? I'm borrowing someone else's machine to make this image, and they would really like to have it back. So I'm not able to just compile different ZFS versions on there at my leisure. Do you want to hop on a Team Viewer or a Zoom meeting to discuss this?

aerusso commented 2 years ago

I'm sorry, @cyberpower678, there's clearly been some miscommunication here. The steps to do a bisect always included checking the endpoints (2.1 and 0.8.6). I might be able to cook up something that would let you store the whole pool in ~160 gigs (i.e., fill the drive with zeros, and then compress the whole drive image), but I don't have any particularly insightful ideas of what to do with this image.

Just to be clear, if you save the output raw send stream (i.e., zfs send -w $DATASET >/tmp/something) everything appears fine, and then try to do the receive, (i.e., zfs receive $OTHERPOOL </tmp/something) do you get the corruption in the received dataset? Do you get that received corruption if you do the receive using an older version of ZFS? Can you try doing so using a freshly created pool that you made under 0.8.6, to avoid issues with the pool having features that are too new for 0.8.6?

cyberpower678 commented 2 years ago

I'm sorry, @cyberpower678, there's clearly been some miscommunication here. The steps to do a bisect always included checking the endpoints (2.1 and 0.8.6). I might be able to cook up something that would let you store the whole pool in ~160 gigs (i.e., fill the drive with zeros, and then compress the whole drive image), but I don't have any particularly insightful ideas of what to do with this image.

Just to be clear, if you save the output raw send stream (i.e., zfs send -w $DATASET >/tmp/something) everything appears fine, and then try to do the receive, (i.e., zfs receive $OTHERPOOL </tmp/something) do you get the corruption in the received dataset? Do you get that received corruption if you do the receive using an older version of ZFS? Can you try doing so using a freshly created pool that you made under 0.8.6, to avoid issues with the pool having features that are too new for 0.8.6?

Upon sending you the image, I was hoping you could compare the metadata of both the source and the destination and why the destination corrupts when the source is perfectly fine. But to confirm yes, the source dataset works fine, but the destination dataset (the one received with zfs receive) will be corrupted. I've confirmed the metadata is corrupted in some manner. I've also confirmed that simply moving the impacted folders/files off of the source and then moving it back onto the source, and then replicating it will have fixed the corruption issue. So the metadata is clearly working in the source but was written in some manner by an older version of ZFS that a newer version doesn't like very much.

Do you happen to have an OS image ready that still has 0.8.6? I could spin that up quickly and see if the pool will import. If it does I can try to do a ZFS send to itself and see if the corruption still shows.

I know for a fact that recreating the dataset will most likely not recreate the corruption. My theory is a very specific version of ZFS (I don't know which) caused this issue, and it was very random? In any event this pool and dataset functions, and has the bad metadata. My hope is you can look at the metadata and see what ZFS is not liking about it. The source and destination files appear intact.

Maltz42 commented 2 years ago

Ubuntu 20.04's repository has 0.8.3, if that is close enough? That version supports encryption and raw sends and has never resulted in corruption for me during nightly send/receive of two encrypted datasets (and a few other unencrypted), since around September 2020 when I started using it.

aerusso commented 2 years ago

Yes, thank you for suggesting that. Debian 9's backport version is too old, and Debian 10 is too new (and the kernel version is too new to support older ZFS versions).

cyberpower678 commented 2 years ago

Alright, I will spin up an Ubuntu 20.04 and install ZFS on it. I will report back.

sskras commented 2 years ago

@cyberpower678, in case it works for you fine, this is a 0.8.6 repo (should you go with a minor upgrade): https://launchpad.net/~jonathonf/+archive/ubuntu/zfs-0.8

At least it mentions the 0.8.6-0york0~20.04 version of zfs-linux:

cyberpower678 commented 2 years ago

Hmmm, the 20.04 comes with ZFS 2.0.4 preinstalled

cyberpower678 commented 2 years ago

Hmm, interesting... From 2.0.4 -> 2.1.2 = corruption From 2.1.2 -> 2.0.4 = no corruption Sending the same dataset from 2.1.2->2.1.2 to the same test pool = corruption. The other copy of the dataset when received by zfs 2.0.4 works fine on 2.1.2

cyberpower678 commented 2 years ago

That means I should be able to simply shrink the pool by sending it to an older version of ZFS on a new pool, in theory.

cyberpower678 commented 2 years ago

HomeServer/brokendataset is the cloned dataset HomeServer/brokentest is a received dataset received by ZFS 2.0.4 sent by ZFS 2.1.2 HomeServer/brokentest2 is the same received dataset received by ZFS 2.1.2 sent by ZFS 2.1.2

As can be seen only HomeServer/brokentest2 is corrupted.

cyberpower678 commented 2 years ago

When the corrupted dataset is received by ZFS 2.1.2, and then later accessed by 2.0.4, the dataset is still corrupted.

When the dataset is received by ZFS 2.0.4, and then later accessed by ZFS 2.1.2, the dataset remains intact.

cyberpower678 commented 2 years ago

I successfully migrated the dataset to an image file. And it's ready to be shared. @aerusso can you reach out to me privately?

Maltz42 commented 2 years ago

Hmmm, the 20.04 comes with ZFS 2.0.4 preinstalled

Are you sure it's 20.04? "apt install zfsutils-linux"? This says that focal repository is 0.8.3: https://packages.ubuntu.com/focal/allpackages

I can provide installer ISOs from 20.04.1, not sure if that would matter or not.

cyberpower678 commented 2 years ago

Hmmm, the 20.04 comes with ZFS 2.0.4 preinstalled

Are you sure it's 20.04? "apt install zfsutils-linux"? This says that focal repository is 0.8.3: https://packages.ubuntu.com/focal/allpackages

I can provide installer ISOs from 20.04.1, not sure if that would matter or not.

Yep, but it doesn't really matter. The corruption issue was not present on 2.0.4 so it's fine. I was also able to migrate the dataset to a 200GB image file for distribution.

cyberpower678 commented 2 years ago

What I have confirmed, is that it's not zfs send that's causing issues. So in hindsight, I could have just sent the zfs dataset into a file on it's own, but that was an unknown until now. That actual culprit is zfs receive.

cyberpower678 commented 2 years ago

So people here know what's going on, @aerusso has been given access to the test image to now play around with. I hope it proves useful to fixing this problem.

cyberpower678 commented 2 years ago

@aerusso have you had any luck with the image. Was it useful?

reidbk commented 2 years ago

I believe that I have just run into this issue replicating from 2.1.4 to 2.1.4 on Ubuntu 22.04 with 5.4.0-109-generic kernel. Replicating over local network with zfs send/recv, mbuffer, pv and netcat. Currently in progress with a scrub attempt.

zpool status

NAME         STATE     READ WRITE CKSUM
tank         ONLINE       0     0     0
  mirror-0   ONLINE       0     0     0
    sdb      ONLINE       0     0     0
    sdc      ONLINE       0     0     0
  mirror-1   ONLINE       0     0     0
    sdd      ONLINE       0     0     0
    sde      ONLINE       0     0     0
special
  mirror-2   ONLINE       0     0     0
    nvme1n1  ONLINE       0     0     0
    nvme2n1  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

    tank/jparchive:<0x0>

Sender

zfs send -R tank/dataset@snap --large-blocks --raw | mbuffer -q -s 1024k -m 1G | pv -b | nc 10.0.1.189 8000

Receiver

nc -l 8000 | mbuffer -q -s 1024k -m 1G | pv -rtab | zfs receive -vFu tank/dataset

kern.log message


Jul  4 21:10:13 jparchive kernel: [81269.771933] cache_from_obj: Wrong slab cache. zio_buf_comb_1536 but object is from zio_buf_comb_1024
Jul  4 21:10:13 jparchive kernel: [81269.771950] WARNING: CPU: 3 PID: 925 at mm/slab.h:449 kmem_cache_free+0x1e1/0x290
Jul  4 21:10:13 jparchive kernel: [81269.771970] Modules linked in: tls nvme_fabrics zfs(PO) zunicode(PO) zzstd(O) zlua(O) zcommon(PO) znvpair(PO) zavl(PO) icp(PO) spl(O) ipmi_ssif nls_iso8859_1 intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm rapl intel_cstate efi_pstore joydev input_leds intel_pch_thermal ioatdma mei_me mei mac_hid acpi_ipmi ipmi_si acpi_pad tcp_htcp sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_devintf ipmi_msghandler msr ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid ast drm_vram_helper drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops gpio_ich mxm_wmi cec crct10dif_pclmul rc_core crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd ahci cryptd i2c_i801 drm ixgbe igb libahci nvme i2c_smbus lpc_ich xfrm_algo xhci_pci i2c_algo_bit
Jul  4 21:10:13 jparchive kernel: [81269.772144]  nvme_core xhci_pci_renesas dca mdio wmi
Jul  4 21:10:13 jparchive kernel: [81269.772157] CPU: 3 PID: 925 Comm: dbuf_evict Tainted: P        W  O      5.15.0-40-generic #43-Ubuntu
Jul  4 21:10:13 jparchive kernel: [81269.772165] Hardware name: Supermicro Super Server/X10SDV-TLN4F, BIOS 2.0a 10/12/2018
Jul  4 21:10:13 jparchive kernel: [81269.772169] RIP: 0010:kmem_cache_free+0x1e1/0x290
Jul  4 21:10:13 jparchive kernel: [81269.772180] Code: 0c 48 8b 78 08 4c 89 e2 e8 ac 08 f9 ff eb b1 48 8b 57 60 49 8b 4f 60 48 c7 c6 f0 11 84 a3 48 c7 c7 e0 bc bf a3 e8 66 c1 96 00 <0f> 0b 4c 89 e6 4c 89 ff e8 52 ab ff ff 48 8b 0d d3 5a 3a 01 e9 ab
Jul  4 21:10:13 jparchive kernel: [81269.772186] RSP: 0018:ffffb7924100bd40 EFLAGS: 00010286
Jul  4 21:10:13 jparchive kernel: [81269.772192] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027
Jul  4 21:10:13 jparchive kernel: [81269.772197] RDX: ffff9d23ffae0588 RSI: 0000000000000001 RDI: ffff9d23ffae0580
Jul  4 21:10:13 jparchive kernel: [81269.772201] RBP: ffffb7924100bd88 R08: 0000000000000003 R09: 00000000010bc030
Jul  4 21:10:13 jparchive kernel: [81269.772205] R10: ffffffffffffffff R11: 0000000000000001 R12: ffff9d195cd92800
Jul  4 21:10:13 jparchive kernel: [81269.772209] R13: ffff9d19dcd92800 R14: ffff9d195cd92800 R15: ffff9d14da1d1e00
Jul  4 21:10:13 jparchive kernel: [81269.772213] FS:  0000000000000000(0000) GS:ffff9d23ffac0000(0000) knlGS:0000000000000000
Jul  4 21:10:13 jparchive kernel: [81269.772219] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul  4 21:10:13 jparchive kernel: [81269.772224] CR2: 0000563acedd27c0 CR3: 0000000268e10006 CR4: 00000000003706e0
Jul  4 21:10:13 jparchive kernel: [81269.772229] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul  4 21:10:13 jparchive kernel: [81269.772232] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jul  4 21:10:13 jparchive kernel: [81269.772237] Call Trace:
Jul  4 21:10:13 jparchive kernel: [81269.772241]  <TASK>
Jul  4 21:10:13 jparchive kernel: [81269.772246]  ? __raw_spin_unlock+0x9/0x10 [zfs]
Jul  4 21:10:13 jparchive kernel: [81269.772484]  spl_kmem_cache_free+0xc9/0x130 [spl]
Jul  4 21:10:13 jparchive kernel: [81269.772513]  zio_buf_free+0x33/0x80 [zfs]
Jul  4 21:10:13 jparchive kernel: [81269.772895]  arc_free_data_buf+0x49/0x60 [zfs]
Jul  4 21:10:13 jparchive kernel: [81269.773087]  arc_buf_destroy_impl+0x4f/0x1d0 [zfs]
Jul  4 21:10:13 jparchive kernel: [81269.773280]  arc_buf_destroy+0x7c/0xf0 [zfs]
Jul  4 21:10:13 jparchive kernel: [81269.773473]  ? dbuf_evict_one+0x180/0x180 [zfs]
Jul  4 21:10:13 jparchive kernel: [81269.773685]  dbuf_destroy+0x31/0x3d0 [zfs]
Jul  4 21:10:13 jparchive kernel: [81269.773893]  ? do_raw_spin_unlock+0x9/0x10 [zfs]
Jul  4 21:10:13 jparchive kernel: [81269.774233]  ? dbuf_evict_one+0x180/0x180 [zfs]
Jul  4 21:10:13 jparchive kernel: [81269.774519]  dbuf_evict_one+0x127/0x180 [zfs]
Jul  4 21:10:13 jparchive kernel: [81269.774729]  dbuf_evict_thread+0xa2/0x100 [zfs]
Jul  4 21:10:13 jparchive kernel: [81269.774938]  thread_generic_wrapper+0x64/0x70 [spl]
Jul  4 21:10:13 jparchive kernel: [81269.774969]  ? __thread_exit+0x20/0x20 [spl]
Jul  4 21:10:13 jparchive kernel: [81269.774998]  kthread+0x12a/0x150
Jul  4 21:10:13 jparchive kernel: [81269.775010]  ? set_kthread_struct+0x50/0x50
Jul  4 21:10:13 jparchive kernel: [81269.775019]  ret_from_fork+0x22/0x30
Jul  4 21:10:13 jparchive kernel: [81269.775034]  </TASK>
Jul  4 21:10:13 jparchive kernel: [81269.775037] ---[ end trace e6b391417c176284 ]---```

openzfs / zfs