psy0rz / zfs_autobackup

ZFS autobackup is used to periodicly backup ZFS filesystems to other locations. Easy to use and very reliable.
https://github.com/psy0rz/zfs_autobackup
GNU General Public License v3.0
601 stars 63 forks source link

zfs-check has EOF issues #176

Open Simon4711 opened 1 year ago

Simon4711 commented 1 year ago

Discussed in https://github.com/psy0rz/zfs_autobackup/discussions/175

Originally posted by **Simon4711** November 30, 2022 I tried out the zfs-check tool, and I don't know how to see this failure: root@pvezfsdus:/home# zfs-check rpool/crypt/hamdus1/USBBackup/Crypt/vm-100-disk-0@hamdus1-20221130030011 --skip 9 --debug | ssh root@pve3ham "zfs-check USBBackup/Crypt/vm-100-disk-0@hamdus1-20221130030011 --check" **Chunk 56320 failed: da39a3ee5e6b4b0d3255bfef95601890afd80709 EOF** I think this is the last generated hash on the volume. I tried access to this zfs vol generating a new one from this checked snapshot in Proxmox, it's a RAW in a Windows VM. It seems to be o.k. But why is this error reported on zfs-check?
psy0rz commented 1 year ago

Strange indeed!

An EOF would mean that USBBackup/Crypt/vm-100-disk-0@hamdus1-20221130030011 is too small.

But its probably a bug in zfs-check i hope for the sake of ZFS, although there a quite a bit of regression tests to catch all the edge cases in zfs-check. But its largely untested in the field.

Could you make absolutely sure by comparing the sha1sum of both sides?

Simon4711 commented 1 year ago

@psy0rz You mean to check it manually like this example?

[root@pve1 ~]# zfs-check /bin > checksums [root@pve1 ~]# zfs-check /bin --check checksums

I started the command

zfs-check rpool/crypt/hamdus1/USBBackup/Crypt/vm-100-disk-0@hamdus1-20221130030011 --skip 9 --debug | ssh root@pve3ham "zfs-check USBBackup/Crypt/vm-100-disk-0@hamdus1-20221130030011 --check"

on the target server. Maybe this is the problem?

Should I do

[root@pve1 ~]# zfs-check /bin > checksums

and then compare manually the last entries to be sure and report that?

psy0rz commented 1 year ago

No, i mean check it without zfs-check to make sure that both volumes are exactly the same.

You should clone the snapshots on both sides, and then run sha1sum on the actual device:

zfs clone USBBackup/Crypt/vm-100-disk-0@hamdus1-20221130030011 USBBackup/test
sha1sum /dev/zvol/USBBackup/test

Do that on both sides and see if the checksum match.

How big is that volume?

Simon4711 commented 1 year ago

O.K. It's checksumming on both sides, lasts a bit.... I do that on another volume synced before, and same failure message of zfs-check. The volume is about 1 TB.

psy0rz commented 1 year ago

And what was the result of that one you did before?

Simon4711 commented 1 year ago

This chunk failed error with EOF too. But lower chunk number, the one reported before is a 6 TB USB volume. The checksum I'm now calculating is on a normal SATA Z1 to have a quicker result.

Simon4711 commented 1 year ago

Yes, SHA1 Checksum is the same for source and target vol:

root@pve3ham:~# sha1sum /dev/zvol/rpool/test 7b304236dc897b5423c6e5237a85a0b96507f4a9 /dev/zvol/rpool/test

root@pvezfsdus:~# sha1sum /dev/zvol/rpool/test 7b304236dc897b5423c6e5237a85a0b96507f4a9 /dev/zvol/rpool/test

psy0rz commented 1 year ago

Ok then its definitely a zfs-check bug. Thanks

Simon4711 commented 1 year ago

Maybe cause I started zfs-check on the target? I checked 3 zfs vols with it from the same WinServer 2019 VM on proxmox, synced before with zfs-autobackup. The sizes and outputs are:

200 GB > o.k. no message 1000 GB > EOF 6000 GB > EOF