Open wdoekes opened 10 months ago
Per your suggestion in https://github.com/openzfs/zfs/issues/14911#issuecomment-1570013030 I compiled zdb from git and ran it here.
If I look at the diff between the zdb -K <key> -dbdbdbdbdbdb
of the 0400Z and 0500Z (broken) snapshot, nothing obvious stands out:
Same with the diff between 0500Z (broken) and 0700Z:
Nor with a diff between two good snapshots 0400Z and 0700Z:
And the difference between the dataset and the snapshot (sans inherit=lz4
-> inherit=inherit
):
# diff -pu <(sed -e 's/inherit=lz4/inherit=inherit/' tank-mysql-redacted-data.zdb-db6-K.log) \
tank-mysql-redacted-data-AT-planb-20231101T0500Z.zdb-db6-K.log
zdb-K-db6-dataset-vs-0500.diff.gz
I'd provide the full files, if you think there is useful info in there. I just have to know I'm not leaking anything, so I'd have to do some more redacting.
I'd try running a version that's current like 2.1.13 or 2.2.0 and seeing if still produces EIOIO, rather than hoping whatever Ubuntu is shipping isn't on fire.
Okay. I've spent some time getting proper deb packages which I can manage in an apt repository. (This meant getting reproducible builds right, because repository managers mandate that two identically named files must be identical.)
https://github.com/ossobv/zfs-kmod-build-deb
This appears to be sufficient to get things up and running.
I loaded the new module and rebooted.
I ran a scrub: only the errors of the still existing snapshot remained.
errors: 943 data errors, use '-v' for a list
-> errors: 6 data errors, use '-v' for a list
I read the files: I could now read all the files in the (previously corrupted) snapshot. Error refs in zpool status -v
changed from:
errors: Permanent errors have been detected in the following files:
tank/mysql/redacted/data@planb-20231101T0500Z:<0x0>
tank/mysql/redacted/data@planb-20231101T0500Z:/mysql_upgrade_info
tank/mysql/redacted/data@planb-20231101T0500Z:/ddl_recovery.log
tank/mysql/redacted/data@planb-20231101T0500Z:<0x20>
tank/mysql/redacted/data@planb-20231101T0500Z:<0x21>
tank/mysql/redacted/data@planb-20231101T0500Z:/galera.pid
to:
errors: Permanent errors have been detected in the following files:
tank/mysql/redacted/data@planb-20231101T0500Z:<0x0>
/var/lib/mysql/redacted/.zfs/snapshot/planb-20231101T0500Z/mysql_upgrade_info
/var/lib/mysql/redacted/.zfs/snapshot/planb-20231101T0500Z/ddl_recovery.log
tank/mysql/redacted/data@planb-20231101T0500Z:<0x20>
tank/mysql/redacted/data@planb-20231101T0500Z:<0x21>
/var/lib/mysql/redacted/.zfs/snapshot/planb-20231101T0500Z/galera.pid
I ran the zfs send
: this now worked. (I did not check the received data, but got no complaints from the remote zfs recv.)
I ran a second scrub: now the last of the data errors disappeared.
errors: No known data errors
That makes me a happy camper.
With regards to self-compiled builds. There are a few items that would be nice to get fixed here:
--bump0
-> --keep-version
)NT_GNU_BUILD_ID
because of a lack of -ffile-prefix-map=
and the use of "random" mktemp pathsSee https://github.com/ossobv/zfs-kmod-build-deb/blob/3ddbe5fb2f09fda9724b232e6f90903d524abed8/Dockerfile#L60-L81 and https://github.com/ossobv/zfs-kmod-build-deb/blob/3ddbe5fb2f09fda9724b232e6f90903d524abed8/dpkg-repackage-reproducible .
Best would probably be to create proper debian packaging infra. Then I could also ensure that e.g. libzpool5 conflicts with libzpool5linux, which is likely going to be my next hurdle.
Thinking out loud...
In any case: this particular issue is fixed with 2.1.13. Thanks!
Walter Doekes OSSO B.V.
Bad news.
The error reappeared in snapshot 2013-11-13 22:00 Z. Same system, 2.1.13 ZFS module now.
# zfs version
zfs-2.1.5-1ubuntu6~22.04.1
zfs-kmod-2.1.13-1osso1
# uptime
09:16:14 up 10 days, 19:29, 2 users, load average: 1.31, 1.42, 1.53
# cat /sys/module/zfs/version
2.1.13-1osso1
We're still running the Ubuntu provided userspace tools (2.1.5), but we run the 2.1.13 kernel module kmod-zfs-5.15.0-88-generic_2.1.13-1osso1_amd64.deb
:
https://github.com/ossobv/zfs-kmod-build-deb/releases/tag/zfs-2.1.13-1osso1
(built by ./buildone.sh jammy 5.15.0-88-generic osso1
)
Error seen in zpool status:
# zpool status -v
pool: tank
state: ONLINE
...
scan: scrub repaired 0B in 01:01:44 with 0 errors on Fri Nov 3 13:27:42 2023
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
nvme-XXX ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
tank/mysql/redacted3/data@planb-20231113T2200Z:<0x0>
tank/mysql/redacted3/data@planb-20231113T2200Z:<0x20>
# zfs list -oname,creation -t snapshot -r tank/mysql/redacted3
NAME CREATION
tank/mysql/redacted3/data@planb-20231113T1900Z ma nov 13 20:00 2023
tank/mysql/redacted3/data@planb-20231113T2000Z ma nov 13 21:00 2023
tank/mysql/redacted3/data@planb-20231113T2100Z ma nov 13 22:00 2023
tank/mysql/redacted3/data@planb-20231113T2200Z ma nov 13 23:00 2023
tank/mysql/redacted3/data@planb-20231114T0124Z di nov 14 2:24 2023
tank/mysql/redacted3/data@planb-20231114T0300Z di nov 14 4:00 2023
tank/mysql/redacted3/data@planb-20231114T0500Z di nov 14 6:00 2023
tank/mysql/redacted3/data@planb-20231114T0700Z di nov 14 8:00 2023
tank/mysql/redacted3/log@planb-20231113T1900Z ma nov 13 20:01 2023
tank/mysql/redacted3/log@planb-20231113T2000Z ma nov 13 21:00 2023
tank/mysql/redacted3/log@planb-20231113T2100Z ma nov 13 22:00 2023
I'll try and replace the userland tools with the 2.1.13 builds and see if anything changes.
@rincebrain: if you have other (zdb?) tips for debugging in the mean time, I'm interested.
But the files were readable in the snapshot -- at least the ones I tried.
Error state:
errors: Permanent errors have been detected in the following files:
tank/mysql/redacted3/data@planb-20231113T2200Z:<0x0>
tank/mysql/redacted3/data@planb-20231113T2200Z:<0x20>
Removing the snapshot:
errors: Permanent errors have been detected in the following files:
<0x8416>:<0x0>
<0x8416>:<0x20>
Doing a scrub:
errors: No known data errors
... running 2.1.13 userland tools now. We'll see if the error reappears.
And now, with userland at 2.1.13 and module 2.1.13, after a reboot, there is a new data error.
# zfs version
zfs-2.1.13-1osso1
zfs-kmod-2.1.13-1osso1
# uname -r
5.15.0-88-generic
errors: Permanent errors have been detected in the following files:
tank/mysql/redacted/data@planb-20231126T1900Z:<0x0>
Interestingly, just after checking (cause?) with zpool status -v
, a second error appeared:
errors: Permanent errors have been detected in the following files:
tank/mysql/redacted/data@planb-20231126T1900Z:<0x0>
tank/mysql/redacted/data@planb-20231126T1900Z:<0x20>
dmesg has nothing (no disk/IO errors or anything). The storage seems fine too:
# nvme smart-log /dev/nvme0n1
Smart Log for NVME device:nvme0n1 namespace-id:ffffffff
critical_warning : 0
temperature : 33 C (306 Kelvin)
available_spare : 99%
available_spare_threshold : 10%
percentage_used : 8%
endurance group critical warning summary: 0
data_units_read : 2.743.533.883
data_units_written : 1.743.454.820
host_read_commands : 251.349.221.948
host_write_commands : 46.896.760.515
controller_busy_time : 18.088
power_cycles : 47
power_on_hours : 26.930
unsafe_shutdowns : 43
media_errors : 0
num_err_log_entries : 0
Warning Temperature Time : 0
Critical Composite Temperature Time : 0
Thermal Management T1 Trans Count : 0
Thermal Management T2 Trans Count : 0
Thermal Management T1 Total Time : 0
Thermal Management T2 Total Time : 0
(I do notice that ashift is 9 and the sector size is 512, which is not optimal for this device.)
(edit: added uname -r)
(SSDs usually hide the performance impact of actual size of IOs to them well enough that you should weigh the observed performance against the space efficiency that results, IMO.)
So, I would guess this is decryption errors, not actual r/w/c errors, since they're not showing up there.
My guess would be there's some additional stupid caveat around the MAC caveats that said patch you originally were applying to Ubuntu's tree is a workaround for - the caveat is, that patch just works around it when it encounters it, a different change is what stops it from screwing up in the first place. (cc #11294 #13709 #14161 63a26454ba43ca3af5020f25d2e59c7ee15bf13d e257bd481bb77181e475bf35292c1a4b0cb2c57a for prior art.) Of course, those are all just for "I cannot unlock the dataset at all because of that error", not files within it erroring...
If it's giving stat errors, I would guess that there could be a similar issue with not just the objset accounting data? I'm not really sure what we can do about that, though, because the reason we can just ignore it in the objset accounting data case is that we can just regenerate it.
If this is a backup target, what's the source running, in terms of ZFS version? What's the automation that's doing the send/recv?
2.1.x has useful counters you can see in grep . /proc/spl/kstat/kcf/*provider*
for when decryption things failed - of course, they were dropped in 2.2.x for some reason, I don't really understand why, but for now at least, you can confirm that's decryption errors.
I would suggest that you look at the zpool events output for those errors, and see what they have to say - in particular, you might also find it useful to look at the (git, so it can look at encrypted objects) zdb output for objects 0 and 0x20 (so 32) on that dataset that's erroring, and see if it reports anything interesting. (0 could actually mean 0 or somewhere in the metadata without more precision; 0x20 appears to be the system attribute node on most of the more recent datasets I've made and a normal directory object on some others, so it'd be good to find out what it is here)
It also, thinking about it, wouldn't completely astonish me if those decryption errors were transient and went away after it decided to regenerate the accounting information, but that's just wild speculation, so if you scrub and they go away, and it doesn't come back, great, that's a nice cleanup that should probably be applied to how that's reported, but that's better than getting "???" for files.
If you're still seeing the ??? for some files, though, then we should probably dig even more into it.
Thanks for the replies :)
If this is a backup target, what's the source running, in terms of ZFS version? What's the automation that's doing the send/recv?
This is a backup source. The target is running 2.1.5, but we're not actually getting any data there (when it fails), so I don't think that can be an issue. The automation consists of: zfs snapshot, zfs set planb:owner=..., zfs send -I. The data is sent unencrypted (not --raw
, it's encrypted with a different key on the target).
so if you scrub and they go away, and it doesn't come back, great
So far this only went away by removing the snapshot and doing a scrub afterwards.
If you're still seeing the ??? for some files, though, then we should probably dig even more into it.
# cd /var/lib/mysql/redacted/.zfs/snapshot/planb-20231126T1900Z/
# ls -a
. ..
# cd ..
# ls -l
total 8
drwxr-xr-x 7 mysql mysql 21 nov 14 13:56 planb-20231126T0800Z
drwxr-xr-x 7 mysql mysql 21 nov 14 13:56 planb-20231126T1700Z
drwxr-xr-x 7 mysql mysql 21 nov 14 13:56 planb-20231126T1800Z
drwxr-xr-x 7 mysql mysql 21 nov 14 13:56 planb-20231126T1900Z
drwxr-xr-x 7 mysql mysql 21 nov 14 13:56 planb-20231126T2100Z
# ls planb-20231126T1800Z
aria_log.00000001 ddl.log galera.cache
...
# ls planb-20231126T2100Z
aria_log.00000001 ddl.log galera.cache
...
# ls planb-20231126T1900Z/
ls: cannot access 'planb-20231126T1900Z/aria_log.00000001': Input/output error
ls: cannot access 'planb-20231126T1900Z/aria_log_control': Input/output error
aria_log.00000001 ddl.log galera.cache
...
Weird. The files weren't there. And then they were, but were unreadable.
Still exactly 2 errors at this point.
# cat planb-20231126T1900Z/galera.cache >/dev/null
cat: planb-20231126T1900Z/galera.cache: Input/output error
But this caused a third error to appear. (And later a fourth.)
errors: Permanent errors have been detected in the following files:
tank/mysql/redacted/data@planb-20231126T1900Z:<0x0> # 1
tank/mysql/redacted/data@planb-20231126T1900Z:<0x20> # 2
tank/mysql/redacted/data@planb-20231126T1900Z:<0x21> # 4
tank/mysql/redacted/data@planb-20231126T1900Z:<0x382> # 3
Here are the counters:
# grep . /proc/spl/kstat/kcf/*provider*
2 1 0x01 4 1088 3737526914 1069440114633433
name type data
kcf_ops_total 4 5954800672
kcf_ops_passed 4 5954800635
kcf_ops_failed 4 37
kcf_ops_returned_busy 4 0
Not sure if these say they are decryption errors or not.
I'll see if I can get some zdb
info tomorrow...
kcf_ops_failed
is pronounced "I failed to decrypt something".
# ls -l good-0x* bad-0x*
-rw-r--r-- 1 root root 14568 nov 27 13:22 bad-0x0
-rw-r--r-- 1 root root 1131 nov 27 13:22 bad-0x20
-rw-r--r-- 1 root root 1118 nov 27 13:22 bad-0x21
-rw-r--r-- 1 root root 553316 nov 27 13:22 bad-0x382
-rw-r--r-- 1 root root 14568 nov 27 13:22 good-0x0
-rw-r--r-- 1 root root 1131 nov 27 13:22 good-0x20
-rw-r--r-- 1 root root 1118 nov 27 13:22 good-0x21
-rw-r--r-- 1 root root 553316 nov 27 13:22 good-0x382
Types of the aforementioned objects:
# grep Object bad-0x* -A1
bad-0x0: Object lvl iblk dblk dsize dnsize lsize %full type
bad-0x0- 0 6 128K 16K 910K 512 784K 15.05 DMU dnode (K=inherit) (Z=inherit=inherit)
--
bad-0x20: Object lvl iblk dblk dsize dnsize lsize %full type
bad-0x20- 32 1 128K 512 1K 512 512 100.00 SA master node (K=inherit) (Z=inherit=inherit)
--
bad-0x21: Object lvl iblk dblk dsize dnsize lsize %full type
bad-0x21- 33 1 128K 512 1K 512 512 100.00 ZFS delete queue (K=inherit) (Z=inherit=inherit)
--
bad-0x382: Object lvl iblk dblk dsize dnsize lsize %full type
bad-0x382- 898 4 128K 16K 10.5M 512 1.00G 2.83 ZFS plain file (K=inherit) (Z=inherit=inherit)
So, DMU dnode
(if 0x0 really is 0x0), SA master node
, ZFS delete queue
and ZFS plain file
respectively.
Very few differences between a good and a bad snapshot:
diff -U10 <(cat good-0x*) <(cat bad-0x*)
I did find the following line in /proc/spl/kstat/zfs/dbgmsg
:
2023-11-26T22:35:19+0100 (1701034519) zfs_ctldir.c:1139:zfsctl_snapshot_mount(): Unable to automount /var/lib/mysql/redacted/.zfs/snapshot/planb-20231126T1900Z error=256
That was when doing the ls -a
yesterday, which turned up nothing.
The aria_log.00000001
and aria_log_control
which showed ???
look like this in zdb
:
Object lvl iblk dblk dsize dnsize lsize %full type
34 1 128K 1.50K 1K 512 1.50K 100.00 ZFS directory (K=inherit) (Z=inherit=inherit)
176 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED
dnode maxblkid: 0
path /
uid 111
gid 120
atime Tue Jul 20 23:05:23 2021
mtime Tue Nov 14 13:56:56 2023
ctime Tue Nov 14 13:56:56 2023
crtime Tue Jul 20 23:05:23 2021
gen 69
mode 40755
size 21
parent 34
links 7
pflags 840800000144
microzap: 1536 bytes, 19 entries
ddl.log = 1273 (type: Regular File)
ibtmp1 = 512 (type: Regular File)
ib_buffer_pool = 896 (type: Regular File)
aria_log.00000001 = 1373 (type: Regular File)
redacted = 1038 (type: Directory)
galera.cache = 898 (type: Regular File)
performance_schema = 934 (type: Directory)
mysql = 1040 (type: Directory)
test = 1037 (type: Directory)
galera.pid = 514 (type: Regular File)
multi-master.info = 660 (type: Regular File)
sys = 936 (type: Directory)
gvwstate.dat = 384 (type: Regular File)
aria_log_control = 1374 (type: Regular File)
ddl_recovery.log = 640 (type: Regular File)
xtrabackup_info = 1382 (type: Regular File)
mysql_upgrade_info = 513 (type: Regular File)
grastate.dat = 5 (type: Regular File)
ibdata1 = 1294 (type: Regular File)
Indirect blocks:
0 L0 DVA[0]=<0:14802d99400:200> DVA[1]=<0:1a400aa6600:200> salt=eb7c19fbb31a2896 iv=91ea3e5bc650043f:146ab2be [L0 ZFS directory] fletcher4 lz4 encrypted LE contiguous unique double size=600L/200P birth=15162731L/15162731P fill=1 cksum=0002c75bc4814940:005986c858f339f9:d63621ecb3d69fed:557039941fa509af
segment [0000000000000000, 0000000000000600) size 1.50K
Object lvl iblk dblk dsize dnsize lsize %full type
1373 2 128K 16K 312K 512 1.45M 100.00 ZFS plain file (K=inherit) (Z=inherit=inherit)
176 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED
dnode maxblkid: 92
path /aria_log.00000001
uid 111
gid 120
atime Thu May 11 15:55:12 2023
mtime Sun Nov 26 19:49:34 2023
ctime Sun Nov 26 19:49:34 2023
crtime Thu May 11 15:55:12 2023
gen 11808755
mode 100660
size 1515520
parent 34
links 1
pflags 840800000004
Indirect blocks:
0 L1 DVA[0]=<0:37d39d9f600:1a00> DVA[1]=<0:32fcf924400:1a00> [L1 ZFS plain file] fletcher4 lz4 indirect-MAC LE contiguous unique double size=20000L/1a00P birth=15384359L/15384359P fill=93 cksum=133591c4d8e18cd7:08819389ec10b94c:46255c22acab03b7:243bd9c3b057da1f
0 L0 DVA[0]=<0:2b9ee7f8800:600> salt=a16006a52ff4bcb3 iv=d60f57e4331171fb:39c57726 [L0 ZFS plain file] fletcher4 lz4 encrypted LE contiguous unique single size=4000L/600P birth=11808755L/11808755P fill=1 cksum=00480df48ce2376f:1b36ffdb1bbc8173:24b90b64cde4f631:5f58e161356986f0
4000 L0 DVA[0]=<0:2ba25bfb800:800> salt=a16006a52ff4bcb3 iv=ba410c2136c80647:1ad3fe9c [L0 ZFS plain file] fletcher4 lz4 encrypted LE contiguous unique single size=4000L/800P birth=11808755L/11808755P fill=1 cksum=00ae75b70020849a:580a60172c11fdc6:eb3fb2b50a981203:a740c033085fe422
8000 L0 DVA[0]=<0:2ba25bfc000:800> salt=a16006a52ff4bcb3 iv=43ac0f67162fc3c3:6df46420 [L0 ZFS plain file] fletcher4 lz4 encrypted LE contiguous unique single size=4000L/800P birth=11808755L/11808755P fill=1 cksum=00b16fcf35454983:59a76676ec6c67f4:326ef34f738dfd38:ab2ab61815713ebc
... truncated ...
Object lvl iblk dblk dsize dnsize lsize %full type
1374 1 128K 512 512 512 512 100.00 ZFS plain file (K=inherit) (Z=inherit=inherit)
176 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED
dnode maxblkid: 0
path /aria_log_control
uid 111
gid 120
atime Thu May 11 15:55:12 2023
mtime Sun Nov 26 19:49:34 2023
ctime Sun Nov 26 19:49:34 2023
crtime Thu May 11 15:55:12 2023
gen 11808755
mode 100660
size 52
parent 34
links 1
pflags 840800000004
Indirect blocks:
0 L0 DVA[0]=<0:234e715dc00:200> salt=eb7c19fbb31a2896 iv=7d98a4386a558b60:fd266813 [L0 ZFS plain file] fletcher4 uncompressed encrypted LE contiguous unique single size=200L/200P birth=15384359L/15384359P fill=1 cksum=0002c2484e02f9e1:005a1eeabbf1c802:0e2f9fce11c32d51:0f35cf920387c23c
segment [0000000000000000, 0000000000000200) size 512
Objects 34, 1373 and 1374 diff between the previous snapshot and the failing one:
diff -U10 good-34-1373-1374 bad-34-1373-1374
Today I have more question marks/immediately failing files:
# ls -l
ls: cannot access 'ddl.log': Input/output error
ls: cannot access 'ib_buffer_pool': Input/output error
ls: cannot access 'aria_log.00000001': Input/output error
ls: cannot access 'galera.cache': Input/output error
ls: cannot access 'aria_log_control': Input/output error
total 2286
-????????? ? ? ? ? ? aria_log.00000001
-????????? ? ? ? ? ? aria_log_control
-????????? ? ? ? ? ? ddl.log
-rw-rw---- 1 mysql mysql 16384 nov 14 13:56 ddl_recovery.log
-????????? ? ? ? ? ? galera.cache
-rw-rw---- 1 mysql mysql 7 nov 14 13:56 galera.pid
...
# ls -l galera.*
ls: cannot access 'galera.cache': Input/output error
-rw-rw---- 1 mysql mysql 7 nov 14 13:56 galera.pid
# cat galera.pid
cat: galera.pid: Input/output error
Object lvl iblk dblk dsize dnsize lsize %full type
514 1 128K 512 512 512 512 100.00 ZFS plain file (K=inherit) (Z=inherit=inherit)
176 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED
dnode maxblkid: 0
path /galera.pid
uid 111
gid 120
atime Tue Nov 14 13:56:56 2023
mtime Tue Nov 14 13:56:56 2023
ctime Tue Nov 14 13:56:56 2023
crtime Tue Nov 14 13:56:56 2023
gen 15162731
mode 100660
size 7
parent 34
links 1
pflags 840800000004
Indirect blocks:
0 L0 DVA[0]=<0:14802d99600:200> salt=eb7c19fbb31a2896 iv=cc4fd4a49efb67fa:28fa32e6 [L0 ZFS plain file] fletcher4 uncompressed encrypted LE contiguous unique single size=200L/200P birth=15162731L/15162731P fill=1 cksum=0002abee1ae25470:0058d8bdb904c79e:a1cd2474ef3ddaba:9846bf62da08af08
segment [0000000000000000, 0000000000000200) size 512
Reading with zdb -r
does seem to work:
# ~osso/openzfs-zfs.git/zdb -K $KEY -dbdbdbdbdbdb tank/mysql/redacted/data@planb-20231126T1900Z -r galera.pid /root/galera.pid
Unlocked encryption root: tank/mysql
Unlocked encryption root: tank/mysql
Copying object 514 to file /root/galera.pid
Object 514 is 7 bytes
# hd /root/galera.pid
00000000 31 35 34 31 35 33 0a |154153.|
00000007
In fact, also the aria_log_control
which was modified between the good (1800Z) and bad (1900Z) snapshot is readable with zdb -r
.
That's about all I can gather from this right now. Let me know if you'd like to see anything else.
My guess remains that it's failing to decrypt the SA and thus refusing you access to the information on the object, but it's not obvious to me why it would be doing that. (zdb, of course, doesn't care about that anyway.)
What's the server hardware that this is in?
I don't immediately recall any bugs with a race in the native encryption code, but, who knows.
What's the server hardware that this is in?
Supermicro X11SCE-F, Intel(R) Xeon(R) E-2236 CPU @ 3.40GHz (6 cores), 4x32GB Micron DDR4 (18ADF4G72AZ-2G6B2), INTEL SSDPE2KX040T8 4TB nvme.
If you have suggestions on debug/tracing code to add to 2.1.13, I can certainly add patches.
There was a second dataset snapshot broken today. I removed both snapshots so the backups are running again. So there's nothing to poke at until the next error pops up.
Checking in.
In November I've had corrupt snapshots on:
In December (2023-12-01 10:03:02), I updated to ZFS 2.1.14-1osso0:
No idea if it could be related to the fixes in 2.1.14. I do hope it is and that the snapshots remain working from now on.
I would naively expect it to be unrelated, as I don't immediately see how the fix for #15526 would have impacted this, and if it were #15395 I wouldn't expect the snapshots to have data readable from zdb, but I could be wrong.
Okay. Well, that theory was fun while it lasted :)
errors: Permanent errors have been detected in the following files:
tank/mysql/redacted/log@planb-20231219T0600Z:<0x0>
For different reasons I'll be disabling hourly snapshots on this host. Let me know if you have anything I can work with, then I'll gladly restart something to try and reproduce.
(/proc/spl/kstat/zfs/dbgmsg
kept)
Maybe relevant: when no snapshots were made in between, I (still) had to scrub twice to get rid of the error after deleting the failing snapshot.
That's expected - the way the error list in zpool status works, it keeps the previous list after a scrub and only removes things after they don't crop up for two scrubs, I think the original idea being that if you have something flaky, it's useful to remember where it was wrong before even if it read fine this time.
Today I saw https://www.phoronix.com/news/OpenZFS-Encrypt-Corrupt and this looks very similar to #12014 :
System information
Problem observation
Input/output error
(Times below without a Z are in CET, i.e. 06:00 is 05:00Z.)
I have snapshots:
But the (now) second to last one, I cannot access.
Some files have stat issues (see the
?
above). None of the files can be read except for the empty file:(Reading the 0 byte file went fine.)
Checking zpool status:
Most of those 941 data errors were from the battle I had with the previous snapshot that went bad. I deleted that bad snapshot yesterday, in the hope that it was just a fluke. That snapshot concerned a different fileset by the way:
tank/mysql/redacted2/data
How to reproduce the problem
I don't know yet. This is the first such issue I have encountered:
planb-20231101T0500Z
andplanb-20231101T0700Z
; the 0700Z one appears to be readable.@planb-20231101T0500Z
, but none that can be synced, because every sync starting from 0400Z tries to include 0500Z as well, and that fails.planb-20231030T0500Z
, so also at 05:00, but an even earlier failure was intank/mysql/redacted3/data@planb-20231027T1800Z
.Include any warning/errors/backtraces from the system logs
dmesg
Empty. Nothing after boot (Oct 17).
/proc/spl/kstat/zfs/dbgmsg
I got this from the debug log, which isn't much:
This is strange, because at 2023-11-01 that (
redacted2
) snapshot should've been gone already. I was now debugging theredacted
one.After clearing the log (
echo 0 > /proc/spl/kstat/zfs/dbgmsg
) and enabling (random) extra stuff (echo 1 > /sys/module/zfs/parameters/spa_load_print_vdev_tree
) I now have this:This means nothing to me, and could just be debug info from the extra
spa_load_print_vdev_tree
I enabled.zed
"When"
First failure (at 05:01:04Z), seen on the receiving end:
Second failure (at 07:00:56Z), seen on the receiving end:
The other side?
The other side has no snapshots after
tank/mysql/redacted/data@planb-20231101T0400Z
. The snapshots that do exist are likely fine (including 0400Z).How to fix?
As I mentioned, I yesterday had the same issue with a bigger snapshot (
redacted2
). There I destroyed the snapshot, and resumed with a snapshot which I still had on both sides (the 0400Z one).I started a scrub then too, which finished before the new issues started:
Interestingly, the mentioned problems in
zpool status -v
were still there.Questions
What other debugging can I do/enable/run/check?
I can probably leave this in this state for a short while, hopefully not too long, while I break out the debugging tools.
Can you help with with some zfs/zdb debugging foo so we can get to the bottom of this?
Thanks! Walter Doekes OSSO B.V.