Closed PaulGrandperrin closed 3 years ago
I'm not aware of any send
issues in 1.9.3.1 - and I don't think ZOL has had any raw send fixes "recently"?
I grepped through all the recent issues here and yes I guess it might be a new bug. Also, there are two more details that might help you:
I'm not aware of any outstanding raw send/recv issues either, but something abnormal clearly occurred. I also don't see anything clearly out of order in the properties you've posted, but perhaps @tcaputi will.
You mentioned you'd sent previous raw incrementals successfully. Do I understand correctly that was done between MacOS and Linux, and you were able to successfully mount them? Was there anything different you can think of with the latest incrementals?
Yes, I checked all the dates of the commands I typed in the past and I can confirm that the previous snapshots were successfully received with the exact same version of ZOL
on the receiver (Debian).
However, I now remember something that might be very important:
when I did this last send|receive, I did it with the backup
zpool locally attached on the Debian server. ie the send and the receive command were executed on zol 0.8.2.
=> the snapshots were created with/by openZFS for macOS
, but openZFS for macOS
was not involved/used to send
the snapshots. Also, I've been able to mount the volumes successfully afterward.
and I now remember why I did it that way (by locally attaching the backup
disk on the Debian server): because I tried before to send the snapshots from macOS but it failed.
I don't remember exactly what was the error but it was something that interrupted the transfer before the end. The volumes weren't corrupted as a result and I just deleted the partially received snapshots and moved on. I remember not being sure if it was ZFS fault (sender or receiver) or the network fault and so I just assumed it was the later.
Maybe there was in fact already an issue with ZOL
or/and openZFS for macOS
.
So maybe there is indeed an issue with how openzfs for macOS
sends its snapshots and another in ZOL
with respect to how it handles receiving corrupted snapshots.
2 other things that might be of interest:
backup
zpool is just one HDD in an USB enclosure usually connected to a macbook. This zpool has been created last year on macOS and therefore has fewer features enabled than the one on Debian.I'm a little confused about the timeline. Would you mind laying out everything relevant that happened as a chronological bullet list of events or something similar? The biggest things I'm looking for are:
It's alright if you dont have all of this information, but it would be helpful for me if you could lay things out chronologically as best you can, if you don't mind.
zpool history might help.
Yes, I haven't had the time to repost a reorganized post as @tcaputi asked but I'll do it :-)
I already know about zpool history
, I just hadn't had the time yet!
Hi again! The confinement gave me time to look into this issue again :-)
Source of the bug
@tcaputi you asked me to better summarize the history of what happened but after looking at it, I really think the main takeaway is that the corruption happened when the send
was executed on openZFS 1.9.3.1,64
and received on zfsonlinux 0.8.2-3~bpo10+1
. All the other details I talked about don't really matter in the context of this bug.
Impossible to even delete the corrupted volumes
Anyway, I decided to not try to recover my corrupted volumes and instead delete them and restore my backups.
I deleted the storage/encrypted/veronique
(that you can see in my previous posts) a few weeks ago and it went well, but now trying to remove the remaining corrupted volumes triggers new errors, see bellow.
Moving forward
Is there things I can still do to help you with getting insights about this bug?
If not, do you know how I could delete those corrupted volumes or should I consider the whole zpool to be irreversibly corrupted?
# zpool status -v
pool: storage
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://zfsonlinux.org/msg/ZFS-8000-8A
scan: scrub repaired 0B in 0 days 11:30:14 with 0 errors on Sun Feb 9 11:54:17 2020
config:
NAME STATE READ WRITE CKSUM
storage ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
sde1 ONLINE 0 0 0
sdf1 ONLINE 0 0 0
sdc1 ONLINE 0 0 0
sdd1 ONLINE 0 0 0
logs
mirror-1 ONLINE 0 0 0
wwn-0x55cd2e404b6ea37d-part1 ONLINE 0 0 0
wwn-0x55cd2e404b6ea368-part1 ONLINE 0 0 0
cache
sdb2 ONLINE 0 0 0
sda2 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
storage/encrypted/paulg:<0x0>
storage/encrypted/paulg@2019:09:19-13:27:<0x0>
<0x1077>:<0x0>
# zfs destroy storage/encrypted -r
cannot destroy snapshot storage/encrypted/paulg@2019:08:20-19:35: dataset is busy
cannot destroy 'storage/encrypted': dataset already exists
# zfs get all storage/encrypted/paulg@2019:08:20-19:35
NAME PROPERTY VALUE SOURCE
storage/encrypted/paulg@2019:08:20-19:35 type snapshot -
storage/encrypted/paulg@2019:08:20-19:35 creation Tue Aug 20 21:35 2019 -
storage/encrypted/paulg@2019:08:20-19:35 used 418K -
storage/encrypted/paulg@2019:08:20-19:35 referenced 907G -
storage/encrypted/paulg@2019:08:20-19:35 compressratio 1.06x -
storage/encrypted/paulg@2019:08:20-19:35 devices off inherited from storage/encrypted
storage/encrypted/paulg@2019:08:20-19:35 exec on default
storage/encrypted/paulg@2019:08:20-19:35 setuid on default
storage/encrypted/paulg@2019:08:20-19:35 createtxg 37239631 -
storage/encrypted/paulg@2019:08:20-19:35 xattr on default
storage/encrypted/paulg@2019:08:20-19:35 version 5 -
storage/encrypted/paulg@2019:08:20-19:35 utf8only on -
storage/encrypted/paulg@2019:08:20-19:35 normalization none -
storage/encrypted/paulg@2019:08:20-19:35 casesensitivity sensitive -
storage/encrypted/paulg@2019:08:20-19:35 nbmand off default
storage/encrypted/paulg@2019:08:20-19:35 guid 9917501289659667102 -
storage/encrypted/paulg@2019:08:20-19:35 primarycache all inherited from storage
storage/encrypted/paulg@2019:08:20-19:35 secondarycache all default
storage/encrypted/paulg@2019:08:20-19:35 defer_destroy off -
storage/encrypted/paulg@2019:08:20-19:35 userrefs 1 -
storage/encrypted/paulg@2019:08:20-19:35 objsetid 389 -
storage/encrypted/paulg@2019:08:20-19:35 mlslabel none default
storage/encrypted/paulg@2019:08:20-19:35 refcompressratio 1.06x -
storage/encrypted/paulg@2019:08:20-19:35 written 907G -
storage/encrypted/paulg@2019:08:20-19:35 clones -
storage/encrypted/paulg@2019:08:20-19:35 logicalreferenced 958G -
storage/encrypted/paulg@2019:08:20-19:35 acltype off default
storage/encrypted/paulg@2019:08:20-19:35 context none default
storage/encrypted/paulg@2019:08:20-19:35 fscontext none default
storage/encrypted/paulg@2019:08:20-19:35 defcontext none default
storage/encrypted/paulg@2019:08:20-19:35 rootcontext none default
storage/encrypted/paulg@2019:08:20-19:35 encryption aes-256-gcm -
storage/encrypted/paulg@2019:08:20-19:35 encryptionroot storage/encrypted -
storage/encrypted/paulg@2019:08:20-19:35 keystatus unavailable -
# zfs get all storage/encrypted
NAME PROPERTY VALUE SOURCE
storage/encrypted type filesystem -
storage/encrypted creation Mon Aug 12 14:17 2019 -
storage/encrypted used 1.22T -
storage/encrypted available 81.6G -
storage/encrypted referenced 3.11M -
storage/encrypted compressratio 1.06x -
storage/encrypted mounted no -
storage/encrypted quota none default
storage/encrypted reservation none default
storage/encrypted recordsize 128K default
storage/encrypted mountpoint /storage/encrypted default
storage/encrypted sharenfs off default
storage/encrypted checksum sha512 received
storage/encrypted compression lz4 received
storage/encrypted atime off inherited from storage
storage/encrypted devices off received
storage/encrypted exec on default
storage/encrypted setuid on default
storage/encrypted readonly off default
storage/encrypted zoned off default
storage/encrypted snapdir hidden default
storage/encrypted aclinherit restricted default
storage/encrypted createtxg 37060671 -
storage/encrypted canmount on default
storage/encrypted xattr on default
storage/encrypted copies 1 default
storage/encrypted version 5 -
storage/encrypted utf8only on -
storage/encrypted normalization none -
storage/encrypted casesensitivity sensitive -
storage/encrypted vscan off default
storage/encrypted nbmand off default
storage/encrypted sharesmb off default
storage/encrypted refquota none default
storage/encrypted refreservation none default
storage/encrypted guid 5275620463138722690 -
storage/encrypted primarycache all inherited from storage
storage/encrypted secondarycache all default
storage/encrypted usedbysnapshots 0B -
storage/encrypted usedbydataset 3.11M -
storage/encrypted usedbychildren 1.22T -
storage/encrypted usedbyrefreservation 0B -
storage/encrypted logbias throughput received
storage/encrypted objsetid 4519 -
storage/encrypted dedup off default
storage/encrypted mlslabel none default
storage/encrypted sync standard default
storage/encrypted dnodesize legacy default
storage/encrypted refcompressratio 2.35x -
storage/encrypted written 3.11M -
storage/encrypted logicalused 1.29T -
storage/encrypted logicalreferenced 5.72M -
storage/encrypted volmode default default
storage/encrypted filesystem_limit none default
storage/encrypted snapshot_limit none default
storage/encrypted filesystem_count none default
storage/encrypted snapshot_count none default
storage/encrypted snapdev hidden default
storage/encrypted acltype off default
storage/encrypted context none default
storage/encrypted fscontext none default
storage/encrypted defcontext none default
storage/encrypted rootcontext none default
storage/encrypted relatime off default
storage/encrypted redundant_metadata all default
storage/encrypted overlay off default
storage/encrypted encryption aes-256-gcm -
storage/encrypted keylocation prompt local
storage/encrypted keyformat passphrase -
storage/encrypted pbkdf2iters 342K -
storage/encrypted encryptionroot storage/encrypted -
storage/encrypted keystatus unavailable -
storage/encrypted special_small_blocks 0 default
I guess the storage/encrypted/veronique
volume that I deleted a few weeks ago is also still there and referred as <0x1077>:<0x0>
in the zpool status
command above.
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
I moved my data to the cloud and don't use zfs at the moment so i can't do anything to help. I'll close
Sender is up-to-date macOS Mojave with
openZFS 1.9.3.1,64
with kernelDarwin macbookpro2018-perso.local 19.2.0 Darwin Kernel Version 19.2.0: Sat Nov 9 03:47:04 PST 2019; root:xnu-6153.61.1~20/RELEASE_X86_64 x86_64
Receiver is up-to-date Debian Buster 10.2 withzfsonlinux 0.8.2-3~bpo10+1
and kernelLinux debianas 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) x86_64 GNU/Linux
Summary
I sent a tree of raw encrypted incremental snapshots to a remote server and it corrupted all the corresponding remote volumes. I created a checkpoint and tried to roll back to the last snapshot but the volumes are still corrupted and data is still inaccessible. Rebooting didn't help.
There are no logs in
dmesg
about ZFS or IO errors.I'm available to help as much as I can, I'm a huge fan of your work :-)
Here are more details:
Details
Volumes on the sender
Volumes on the receiver before the send/receive operation:
The command to send the snapshots (finished without errors):
Volumes on the receiver after the send/receive operation:
Trying to mount the newly received volumes on the receiver:
I think my heart skipped a beat at that point, but yes, of course, I do have offline backups (not fully up-to-date through...)
Checking the health of the receiver zpool:
Properties on the sender
Properties on the receiver: