Open gaykitty opened 6 months ago
Can you share the output of zfs get all
on data/postgres
, otherwise it's going to be rather difficult to try and reproduce what's going on here.
@rincebrain here ya go:
NAME PROPERTY VALUE SOURCE
data/postgres type filesystem -
data/postgres creation Mon Apr 10 23:15 2023 -
data/postgres used 221M -
data/postgres available 25.5G -
data/postgres referenced 117M -
data/postgres compressratio 1.71x -
data/postgres mounted yes -
data/postgres quota none default
data/postgres reservation none default
data/postgres recordsize 8K local
data/postgres mountpoint /var/lib/postgresql/ local
data/postgres sharenfs off default
data/postgres checksum on default
data/postgres compression on default
data/postgres atime on default
data/postgres devices on default
data/postgres exec on default
data/postgres setuid on default
data/postgres readonly off default
data/postgres zoned off default
data/postgres snapdir hidden default
data/postgres aclmode discard default
data/postgres aclinherit restricted default
data/postgres createtxg 174 -
data/postgres canmount on local
data/postgres xattr on default
data/postgres copies 1 default
data/postgres version 5 -
data/postgres utf8only off -
data/postgres normalization none -
data/postgres casesensitivity sensitive -
data/postgres vscan off default
data/postgres nbmand off default
data/postgres sharesmb off default
data/postgres refquota none default
data/postgres refreservation none default
data/postgres guid 970173609522556085 -
data/postgres primarycache metadata local
data/postgres secondarycache all default
data/postgres usedbysnapshots 103M -
data/postgres usedbydataset 117M -
data/postgres usedbychildren 0B -
data/postgres usedbyrefreservation 0B -
data/postgres logbias throughput local
data/postgres objsetid 274 -
data/postgres dedup off default
data/postgres mlslabel none default
data/postgres sync standard default
data/postgres dnodesize legacy default
data/postgres refcompressratio 2.02x -
data/postgres written 1.64M -
data/postgres logicalused 351M -
data/postgres logicalreferenced 225M -
data/postgres volmode default default
data/postgres filesystem_limit none default
data/postgres snapshot_limit none default
data/postgres filesystem_count none default
data/postgres snapshot_count none default
data/postgres snapdev hidden default
data/postgres acltype off default
data/postgres context none default
data/postgres fscontext none default
data/postgres defcontext none default
data/postgres rootcontext none default
data/postgres relatime on default
data/postgres redundant_metadata all default
data/postgres overlay on default
data/postgres encryption aes-256-gcm -
data/postgres keylocation file:///var/secrets/zfs.key local
data/postgres keyformat raw -
data/postgres pbkdf2iters 0 default
data/postgres encryptionroot data/postgres -
data/postgres keystatus available -
data/postgres special_small_blocks 0 default
data/postgres snapshots_changed Wed May 1 15:00:06 2024 -
data/postgres nixos:shutdown-time Mon Mar 18 12:20:20 AM EDT 2024 inherited from data
My assumption would be that something is erroring on decryption during send, since it's not using -w
, and somehow because it's transient it's vanishing almost immediately, but the error is still breaking your send stream.
zpool events -v
might know what it was that it burped on, though probably not why. (My guess would be something strange like it's hitting some metadata object where DVA[0]
fails to decrypt but DVA[1]
does and somehow native encryption is screwing up error handling so trying DVA[1]
doesn't stop it from returning an error? Not sure, I've not seen this reproduce locally, but people have occasionally reported strange hiccups from native encryption that made me wonder if something strange was going on even after 2163cde450d0898b5f7bac16afb4e238485411ff.)
I was just poking at my system and noticed that a NixOS update had downgraded the kernel version to 6.6.29, but I didn't notice and hadn't rebooted. So I rebooted by my system. After the reboot my pool showed errors like before. After 2 scrubs the errors cleared and doing the zfs send from before no longer produces the bug.
Getting the same kind of issue with an empty file list, albeit it's not clearing and survives export and import. Doing a scrub now.
System information
Describe the problem you're observing
When trying to send particular snapshots (as determined by syncoid), the sending pool will start reporting ZFS-8000-8A, with an empty file list. But then there error clears on it own after 5-30 seconds and multiple runs of zpool status -v. The output of zfs send is also invalid.
Describe how to reproduce the problem
Happens every time, don't know how it got in this state.
Include any warning/errors/backtraces from the system logs
zfs status output after zfs send:
zfs status output after 5-30 seconds and multiple runs of zpool status -v:
Output of
zfs send -I 'data/postgres@autosnap_2024-04-30_00:00:02_daily' 'data/postgres@autosnap_2024-05-01_01:00:13_hourly' | zstreamdump
: