permanent errors (ereport.fs.zfs.authentication) reported after syncoid snapshot/send workload

aerusso commented 3 years ago

System information

Type	Version/Name
Distribution Name	Debian
Distribution Version	testing (bullseye)
Linux Kernel	5.10.19
Architecture	amd64
ZFS Version	2.0.3-1

Describe the problem you're observing

After upgrading to zfs 2.0.3 and Linux 5.10.19 (from 0.8.6), a well-tested syncoid workload causes "Permanent errors have been detected in the following files:" reports for a pool/dataset@snapshot:<0x0> (no file given).

Removing the snapshot, and running a scrub causes the error to go away.

This is on a single-disk nvme SSD---never experiencing any problems before upgrading---and has happened twice, once after each reboot/re-running of the syncoid workload. I have since booted back into 0.8.6, ran the same workload, and have not experienced the error report.

Describe how to reproduce the problem

Beyond the above, I do not have a mechanism to reproduce this. I'd rather not blindly do it again!

Include any warning/errors/backtraces from the system logs

See also @jstenback's report of very similar symptoms: 1 and 2, which appear distinct from the symptoms of the bug report they are in. Additionally, compare to @rbrewer123's reports 1 and 2, which comes with a kernel panic---I do not experience this.

My setup is very similar: I run a snapshot workload periodically, and transfer the snapshots every day to another machine. I also transfer snapshots much more frequently to another pool on the same machine.

If valuable, I have zpool history output that I can provide. Roughly, the workload looks like many snapshot, send -I, destroy (on one pool) and receive (on the same machine, but another pool ).

mheubach commented 2 years ago

Small summary and maybe to early to definitely call this the reason / solution: On the host we experienced this problem, we did unencrypted sends of encrypted zvols. I changed this yesterday to raw sends (-w). (After the crash of the machine yesterday I had the downtime anyway - so no need to ask my customer :-)) Until now none of the errors came back. At the moment some replications aren't activated yet. We do this during the day. If the errors show up again, we maybe can identify a certain zvol as the reason. But for now I am petty much sure, that unencrypted sends of encrypted zvols (or maybe also datasets) trigger the bug. Out of about 15 servers this one here was the only one on which we did unencrypted sends (due to a configuration error :-)) and this was the only machine showing up with these errors. @mattchrist: changing replication to raw encrypted maybe solves this for you, too. As unencrypted and encrypted sends are not "compatible", you can't switch to encrypted sends with incremental sends. Means you have to do new initial replications - which can lead to a full pool on the destination if you can't move or delete the already replicated datasets/zvols. We do this zvol after zvol at the moment. So we can delete the "old" replica after the new replica is complete. We are talking of Terabytes in our use case :-)

Blackclaws commented 2 years ago

We also only experienced this after starting with unencrypted sends. Unfortunately back when we started, encrypted sends weren't possible because they were just straight up broken on the destination (should be fixed by now https://github.com/openzfs/zfs/issues/12594).

The problem as @mheubach aptly states is that you can't migrate from unencrypted to encrypted sends. As this is our main backup for multiple servers that has months of history in it we can't just throw away the intermediate snapshots on the target that no longer exist on the production systems. This unfortunately means that we are more or less stuck with unencrypted sends now.

I'm also not sure whether it might not also be related to sends that authenticate via dataset acls. We do sends by user account specifically only allowed to access and send datasets not write to them or change them in any other way. This is to ensure that our backup systems do not actually damage our production systems.

mattchrist commented 2 years ago

A while back I tried a raw send of a 'corrupted' snapshot, and it seemed to work (did not return an I/O Error), so I do think that converting to raw sends may be a workaround.

I have a few other details that might be worth sharing. For me the issue takes several days to show up after a reboot. The last occurrence for one of my servers took 8 days after boot, another, 13 days after a boot. Two other similarly-configured systems (same hardware, zpool, syncoid implementations) haven't experienced this issue, but they have very-little disk activity -- maybe that matters.

wohali commented 2 years ago

have encrypted zvols / datasets which you replicate decrypted or do you use raw encrypted zfs sends (switch "-w")?

Not raw. No -w.

mheubach commented 2 years ago

Adding to my post from July 15th: We experienced no more problems since we switched to raw encrypted sends. This is no solution for the bug of course but hopefully a workaround for everybody affected. I think there shouldn't be many use cases where you do an unencrypted send of encrypted data with ZFS.

Blackclaws commented 2 years ago

Adding to my post from July 15th: We experienced no more problems since we switched to raw encrypted sends. This is no solution for the bug of course but hopefully a workaround for everybody affected. I think there shouldn't be many use cases where you do an unencrypted send of encrypted data with ZFS.

The problem here is that encrypted sends used to be broken on the receiving side, so we had to use unencrypted sends. We can't just migrate to encrypted sends now without breaking the complete history. So a fix for this would be much appreciated.

mheubach commented 2 years ago

So - now I definitely have a workload, where there is no workaround for unencrypted sends: If you want to transfer a volume or dataset from one encrypted pool to another encrypted pool and you want to keep your history of taken snapshots and you want to use the encryption root of the receiving pool, you have to use unencrypted sends. If you don't, you have to load the encryption key for each received dataset or volume, because the encryption root differs from the sending side, even if you use the same encryption key. But be careful if you replicate you're data to another host. You have to retransmit the entire Snapshots "chain", as the newly received snapshots have nothing in common with your already received snapshots. If you are low on space or working with big data - think about this twice.

Retrodynen commented 2 years ago

hi

FWIW I don't think this is the same bug, when I experience the bug the counters never ever go up, just dozens of errors that are cleared after removing snapshots and scrubbing again, I think your disks are legitimately failing :/

-- Thanks, Syl

From: Jeroen @.> To: openzfs/zfs @.> CC: Retrodynen @.>; Manual @.> Date: 8 May 2022 16:54:48 Subject: Re: [openzfs/zfs] permanent errors (ereport.fs.zfs.authentication) reported after syncoid snapshot/send workload (#11688)

And know I have issues with both disk:

*# zpool status -v pool: data state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P scan: scrub in progress since Sat May 7 16:35:31 2022 7.82T scanned at 301M/s, 3.77T issued at 145M/s, 7.97T total 8.50M repaired, 47.26% done, 08:26:39 to go config:
    NAME                                    STATE     READ WRITE 
CKSUM data ONLINE 0 0
0 mirror-0 ONLINE 0 0
0 ata-WDC_WD100EFAX-68LHPN0_JEHD8AVN ONLINE 0 15
0 ata-WDC_WD100EFAX-68LHPN0_JEHUZR6N ONLINE 27 0
42 (repairing)

errors: No known data errors * — Reply to this email directly, view it on GitHub[https://github.com/openzfs/zfs/issues/11688#issuecomment-1120442911], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AQ2C6QE3ZBN2AA5JQAZP4LTVI7PUHANCNFSM4YSLZDFA]. You are receiving this because you are subscribed to this thread.[Tracking image][https://github.com/notifications/beacon/AQ2C6QGRYXP3425XQTUPYFTVI7PUHA5CNFSM4YSLZDFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOILEJUHY.gif]

Blackclaws commented 2 years ago

So is there anyone actively trying to track down the source of this or is there any way to fix this without having to reboot/remount the host/filesystem. This is actively disrupting our production servers since we either have to reboot them regularly or stop backups.

If there is no solution we'll have to think about forgoing zfs native encryption until this is fixed at some point in the future.

Ryushin commented 2 years ago

I'm seem to be experiencing this daily (or more than once each day). Switching to raw syncs does not seem to improve anything. This is from my Thinkpad P17 Gen2, with a Xeon CPU and 128GB of ECC RAM, so even though it is a laptop, I have all the boxes ticked to not having corruption. I have a mirrored VDEV with two 2TB NVME drives sending to my server that does not use encryption.

I'm almost at the point of dumping encryption on my laptop until this is fixed. Is there any debugging I can provide since this is happening so often on my system?

Once I delete the bad snapshots, I have to run a scrub twice to fix the pool. Luckily it only takes a few minutes to run a scrub.

mheubach commented 2 years ago

@Ryushin: Can you give more details? If you're sending raw encrypted to another server the received data will be encrypted. You can't send raw encrypted to a server and not having the received dataset beeing encrypted too. So I suspect you're doing something wrong.

Ryushin commented 2 years ago

@Ryushin: Can you give more details? If you're sending raw encrypted to another server the received data will be encrypted. You can't send raw encrypted to a server and not having the received dataset beeing encrypted too. So I suspect you're doing something wrong.

The raw sends are showing encrypted on the destination. Since sending raw did not fix this problem, I've reverted back to not sending raw any longer. (destroyed the snaps on the destination and remove the -w option from syncoid).

This morning, and I'm currently trying to fix this as I'm typing here, I had 145 bad snapshots. I've cleared them out and I'm now running scrubs, which only takes about five minutes. Before this happened, I saw all my CPU threads go to max for a few minutes. pigz-9 was the top CPU usage (I used compress=pigz-slow in syncoid) After the CPU calmed down, I had those 145 bad snapshots. It might be time to recreate my pool without encryption.

isopix commented 2 years ago

In the past I was told that sending raw snapshots is not affected by this bug. Isn't that case?

Ryushin commented 2 years ago

In the past I was told that sending raw snapshots is not affected by this bug. Isn't that case?

Yea, I thought that was the case in reading the thread. Though I was still getting corrupted snapshots a few hours after changing to sending raw. I've reverted back to non raw now as I'd rather have the backup data on my local server unencrypted.

kyle0r commented 2 years ago

Based on the last couple of posts I thought I might point out/remind that raw and non-raw sends are not bi-directionally interoperable (at least for encrypted datasets).

man: https://openzfs.github.io/openzfs-docs/man/8/zfs-send.8.html#w

Note that if you do not use this flag (-w, --raw) for sending encrypted datasets, data will be sent unencrypted and may be re-encrypted with a different encryption key on the receiving system, which will disable the ability to do a raw send to that system for incrementals.

So @Ryushin reading your posts, it sounds like you might have bit on confusion to clear up on encrypted vs. unencrypted datasets, and the behaviour of raw and non-raw sends in relation to encrypted datasets. It would help if you can share your workflow and commands being used in a detailed post, so folks can better visualise your setup and provide assistance.

@Blackclaws, you said the following in September:

We can't just migrate to encrypted sends now without breaking the complete history. So a fix for this would be much appreciated.

I was wondering if you could raw send a fully copy of the datasets (with the history you want to maintain) to a temp dst, including the latest common snapshot from your src, and then try raw sending from the src to the temp dst to see if it would continue with raw replication? If yes, I think you know the suggestion I'm pointing towards?
OR would it be possible to re-seed the src from the dst and then start the replication again?

Andreas-Marx commented 2 years ago

I am reading here, because I was affected by issue #11294, for which a fix PR #12981 ended up in OpenZFS 2.1.3 . I still get "permanent" pool errors on <0x0> from time to time when I try to expire snaps, because I still have many encrypted snapshots that were at some point raw-received with OpenZFS<2.1.3 . But I am quite confident that newer snapshots are not affected, above mentioned flaw was obviously my problem.

Does anyone have a case where no incremental snap was ever received with OpenZFS < 2.1.3 ?

atj commented 2 years ago

$ uname -a
Linux <redacted> 5.10.0-16-amd64 #1 SMP Debian 5.10.127-1 (2022-06-30) x86_64 GNU/Linux
$ zfs --version
zfs-2.1.5-1~bpo11+1
zfs-kmod-2.1.5-1~bpo11+1

$ zpool status rpool
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 00:02:49 with 0 errors on Sun Nov 13 00:26:50 2022
config:

    NAME                                                  STATE     READ WRITE CKSUM
    rpool                                                 ONLINE       0     0     0
      mirror-0                                            ONLINE       0     0     0
        ata-Micron_1100_MTFDDAK256TBN_171516D965A6-part4  ONLINE       0     0     0
        ata-Micron_1100_MTFDDAK256TBN_171516D9656A-part4  ONLINE       0     0     0

errors: No known data errors
$ zfs get encryption rpool
NAME   PROPERTY    VALUE        SOURCE
rpool  encryption  aes-256-gcm  -

$ zfs list -t snapshot -r rpool
no datasets available
$ zfs snapshot -r rpool@backup

$ zfs send -Rw rpool@backup | cat > /dev/null
warning: cannot send 'rpool/srv@backup': Invalid argument

$ zpool status -v rpool
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 00:02:47 with 0 errors on Thu Nov 17 17:39:19 2022
config:

    NAME                                                  STATE     READ WRITE CKSUM
    rpool                                                 ONLINE       0     0     0
      mirror-0                                            ONLINE       0     0     0
        ata-Micron_1100_MTFDDAK256TBN_171516D965A6-part4  ONLINE       0     0     0
        ata-Micron_1100_MTFDDAK256TBN_171516D9656A-part4  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        rpool/srv@backup:<0x0>

$ zfs list -H -o name -t snapshot -r rpool |while read; do zfs destroy $REPLY; done
$ zpool status -v rpool
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 00:02:47 with 0 errors on Thu Nov 17 17:39:19 2022
config:

    NAME                                                  STATE     READ WRITE CKSUM
    rpool                                                 ONLINE       0     0     0
      mirror-0                                            ONLINE       0     0     0
        ata-Micron_1100_MTFDDAK256TBN_171516D965A6-part4  ONLINE       0     0     0
        ata-Micron_1100_MTFDDAK256TBN_171516D9656A-part4  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        <0x19433>:<0x0>

$ zpool scrub -w rpool
$ zpool status -v rpool
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 00:02:48 with 0 errors on Thu Nov 17 17:47:41 2022
config:

    NAME                                                  STATE     READ WRITE CKSUM
    rpool                                                 ONLINE       0     0     0
      mirror-0                                            ONLINE       0     0     0
        ata-Micron_1100_MTFDDAK256TBN_171516D965A6-part4  ONLINE       0     0     0
        ata-Micron_1100_MTFDDAK256TBN_171516D9656A-part4  ONLINE       0     0     0

errors: No known data errors

The dataset where the error occurs is not deterministic but it happens every time.

Ryushin commented 2 years ago

Based on the last couple of posts I thought I might point out/remind that raw and non-raw sends are not bi-directionally interoperable (at least for encrypted datasets).

man: https://openzfs.github.io/openzfs-docs/man/8/zfs-send.8.html#w

Note that if you do not use this flag (-w, --raw) for sending encrypted datasets, data will be sent unencrypted and may be re-encrypted with a different encryption key on the receiving system, which will disable the ability to do a raw send to that system for incrementals.

So @Ryushin reading your posts, it sounds like you might have bit on confusion to clear up on encrypted vs. unencrypted datasets, and the behaviour of raw and non-raw sends in relation to encrypted datasets. It would help if you can share your workflow and commands being used in a detailed post, so folks can better visualise your setup and provide assistance.

So on the destination, I'm not mixing encrypted (raw) and non encrypted snapshots to the same destination dataset. When I switched to raw (-w) I destroy the destination datasets first.

My syncoid command:

/usr/bin/flock -n /run/syncoid-cron-to-myserver.lock -c "/usr/sbin/syncoid --recursive --skip-parent --compress=zstd-fast --no-sync-snap rpool root@myserver:netshares/zfs_backups/muaddib_backup" | /usr/bin/logger --tag=syncoid

My zfs list:

root@muaddib:~# zfs list
NAME                USED  AVAIL     REFER  MOUNTPOINT
rpool               516G  1.25T      192K  none
rpool/ROOT          374G  1.25T      280G  /
rpool/steam_games   142G  1.25T      142G  /home/chris/.steam

It is pretty much always the steam_games dataset that is seeing corrupted snapshots.

I'm going to create a unencrypted dataset to put the steam_games in since there is nothing that needs to be secure there.

Ryushin commented 2 years ago

Actually, after the problem this morning it seems that manipulating any of the rpool/steam_games snapshots results in a problem. So this dataset probably has some underlying corruption in it now. Even deleting all snapshots, creating a new snapshot and trying to zfs send | zfs receive locally instantly gives an error:

warning: cannot send 'rpool/steam_games@migrate': Invalid argument

And the pool has a permanent error from that point.

I tar'd the dataset, I made a new unencrypted dataset and then untared to that. Hopefully this should fix the problems I'm seeing...... for a little while.

Blackclaws commented 2 years ago

@Blackclaws, you said the following in September:

We can't just migrate to encrypted sends now without breaking the complete history. So a fix for this would be much appreciated.

I was wondering if you could raw send a fully copy of the datasets (with the history you want to maintain) to a temp dst, including the latest common snapshot from your src, and then try raw sending from the src to the temp dst to see if it would continue with raw replication? If yes, I think you know the suggestion I'm pointing towards? OR would it be possible to re-seed the src from the dst and then start the replication again?

Raw replication will not work on a previously non raw replicated targets.

Our issue is that our history goes back much further than our current systems as these are backups. While, yes, technically it would be possible to restore from backup to the live systems and then raw replicate to the backup systems this would incur a rather large amount of downtime, which is currently not acceptable.

There are also other good reasons not to have an encrypted backup or have the backup be encrypted by a different key than the source. Therefore fixing the issues that still exist here should be preferred to just solving the issue by working around it.

Ryushin commented 2 years ago

Well, that did not last long. I got two bad snapshots for my rpool/ROOT dataset which contains my main critical data. I'm going to have recreate my pool without encryption this weekend and restore it from snapshot from my server. I wanted to wait two ZFS versions after encryption was rolled out to let it mature, but this is a major bug that looks like it leads to dataloss if it's allowed to keep happening.

Blackclaws commented 2 years ago

Well, that did not last long. I got two bad snapshots for my rpool/ROOT dataset which contains my main critical data. I'm going to have recreate my pool without encryption this weekend and restore it from snapshot from my server. I wanted to wait two ZFS versions after encryption was rolled out to let it mature, but this is a major bug that looks like it leads to dataloss if it's allowed to keep happening.

You shouldn't actually have lost any data. The snapshots show as bad but aren't actually in any way corrupted. Reboot the system and all should be good. To get the error to vanish you have to run two scrubs though.

kyle0r commented 2 years ago

@Ryushin it would good to see your workflow and exact commands to better understand your scenario, also your zfs versions and co.

Ryushin commented 2 years ago

Well, that did not last long. I got two bad snapshots for my rpool/ROOT dataset which contains my main critical data. I'm going to have recreate my pool without encryption this weekend and restore it from snapshot from my server. I wanted to wait two ZFS versions after encryption was rolled out to let it mature, but this is a major bug that looks like it leads to dataloss if it's allowed to keep happening.

You shouldn't actually have lost any data. The snapshots show as bad but aren't actually in any way corrupted. Reboot the system and all should be good. To get the error to vanish you have to run two scrubs though.

I have not lost any data as of yet. But not being able to access snapshots using local zfs send/receive is not good. Though I did not reboot. Also having 145 previous snapshots go "bad" is also a scary proposition. I do have ZFS send/receive (backups) to my server along with traditional file level backups using Bareos every night. So technically I can recover from disaster.

@Ryushin it would good to see your workflow and exact commands to better understand your scenario, also your zfs versions and co.

My workflow is probably very typical.

Source: Thinkpad P17 Gen2 with 128GB ECC RAM, Xeon mobile processor. OS: Devuan Chimaera (Debian Bullseye minus SystemD) Kernel: 5.18.0-0.deb11.4-amd64 ZFS: Version 2.1.5-1~bpo11+1 (provided by Debian) Mirrored encrypted pool using two Samsung 970 Evo Plus NVME M.2 drives. Snapshots taken with sanoid every 15 minutes. Snapshots sent to destination server using syncoid. Syncoid command: usr/sbin/syncoid --recursive --skip-parent --compress=zstd-fast --no-sync-snap rpool root@myserver:netshares/zfs_backups/muaddib_backup Note: Raw snapshots are not sent, as I prefer the data on the destination to not be encrypted.

Destination Server: Supermicro 36 drive chassis with dual Xeon Processors and 128GB of ECC RAM. OS: Devuan Chimaera (Debian Bullseye minus SystemD) Kernel: 5.10.0-17-amd64 ZFS: Version 2.1.5-1~bpo11+1 (provided by Debian) Three 10 spinning drives wide RaidZ2 VDEVs in one pool. One 6 drive SATA SSD wide RaidZ2 that is the root pool. Both pools do not use encryption. Server also runs sanoid for snapshots and sends it's it's root pool to an offsite encrypted zfs storage located at ZFS.rent.

So nothing really out of the ordinary.

Edit: I should mention that all my pools are using ZFS 2.0 features and I have not yet upgraded them yet to 2.1.

siilike commented 1 year ago

I am facing the same issue and previously I was complaining in #12014.

zfs-2.1.9-1~bpo11+1 on Debian.
Snapshots taken and pruned with a custom program, unencrypted replication with syncoid --bookmark --no-sync-snap.
Two mirrored encrypted pools, both facing the same issue. One also has mirrored special devices.
One pool worked fine for a while, now happens to both again.
There is a strong correlation with how much data is written to the dataset: after creating a third dataset and moving some write-heavy apps there the situation has slightly improved.
Usually it goes bad after the 2nd snapshot -- after deleting some old snapshots it works fine for one round of replication, and once another snapshot is taken one of them is not possible to replicate. This also applies when replication is not done between the snapshots.
No relation to taking snapshots while a send is in progress.
Dataset is random, but mostly happens to the same ones (write-heavy or at least some writes)
Only remedy this far is to destroy the affected snapshots ASAP.
Don't remember any panics, but the receiving side has been having affected by #14252 and #12001.

Given a test script:

#!/bin/sh

for i in $(seq 1 100)
do
        echo $i
        zfs snapshot data2/test@$i
        zfs send data2/test@$i | cat > /dev/null
done

for i in $(seq 101 200)
do
        echo $i
        dd if=/dev/zero of=test bs=1 count=1 2>/dev/null
        zfs snapshot data2/test@$i
        zfs send data2/test@$i | cat > /dev/null
done

the output would be:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
warning: cannot send 'data2/test@114': Invalid argument
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
warning: cannot send 'data2/test@148': Invalid argument
149
150
151
152
153
154
155
156
157
158
159
160
warning: cannot send 'data2/test@160': Invalid argument
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
warning: cannot send 'data2/test@181': Invalid argument
182
183
184
185
186
187
188
189
190
191
192
193
warning: cannot send 'data2/test@193': Invalid argument
194
195
196
197
198
199
200

So snapshots without any new data don't trigger the issue, but writing even one byte will.

rjycnfynby commented 10 months ago

I'm experiencing similar problem with ZFS native encryption on version 2.2.2 and kernel 6.6.13 on all of my servers and zvol pools. Permanent errors starting to appear after about three days of uptime.

This is an old thread and I don't see any solution was found. Does it mean that ZFS native encryption is not production ready yet?

mheubach commented 10 months ago

You can use it for production. It is stable and there is no data loss or data corruption problem. The problem seems to occur within the metadata of the in memory list of snapshots. In my experience this happens only if you have:

statistically sufficient write activity within your encrypted datasets or zvols
decrypted zfs sends at high intervals
regular creations and deletions of snapshots whilst doing zfs send

There are workloads where you have to do unencrypted sends. For the time being i suggest you to make sure you don't create or delete snapshots while an unencrypted send is running.

If you only do raw encrypted zfs sends, the problem does not occur.

rjycnfynby commented 10 months ago

You can use it for production. It is stable and there is no data loss or data corruption problem. The problem seems to occur within the metadata of the in memory list of snapshots. In my experience this happens only if you have:

You are correct. The data within the VMs looks good and problem only affect the snapshots consistency. Unfortunately there is no way of fixing the problem once the permanent error ZFS-8000-8A happened. I can only create a new pool.

* statistically sufficient write activity within your encrypted datasets or zvols

What do you mean by "statistically sufficient write activity"? I'm running about a dozen of VMs on each hypervisor and this can trigger an issue?

* decrypted zfs sends at high intervals

I was sending incremental snapshots using a default syncoid settings which I believe does an unencrypted zfs send because encrypted datasets are using a different keys. Both servers are connected to the same switch with a 10G direct link. Not sure if I understand what is "high intervals" in this case. Can you elaborate, please?

* regular creations and deletions of snapshots whilst doing zfs send

My sanoid configuration does recursive snapshots on each dataset and zvol described in its configuration file. I don't think I can create a delay between each snapshot without modifying the script. Does it mean that it's not recommended to do a recursive snapshots within an encrypted dataset?

There are workloads where you have to do unencrypted sends. For the time being i suggest you to make sure you don't create or delete snapshots while an unencrypted send is running.

If you only do raw encrypted zfs sends, the problem does not occur.

I'm sure that I was using a lock-file that prevents running two sanoid/syncoid scripts at the same time. I believe that only one instance of the sanoid/syncoid script can run at a time.

If I understand you correctly sending raw encrypted zfs might help to avoid the issue with inconsistent list of snapshots?

siilike commented 10 months ago

You're fine to use it in production if: a) this issue does not occur for some reason, or b) you are willing to restart your server every few days, especially if you are running Docker on zfs which quickly becomes unusable.

mheubach commented 10 months ago

What do you mean by "statistically sufficient write activity"? I'm running about a dozen of VMs on each hypervisor and this can trigger an issue?

that just means, that you have a relevant amount of writes, to trigger this error - which you probably have

I was sending incremental snapshots using a default syncoid settings which I believe does an unencrypted zfs send because encrypted datasets are using a different keys. Both servers are connected to the same switch with a 10G direct link. Not sure if I understand what is "high intervals" in this case. Can you elaborate, please?

that simply means, that only occasional sends are likely not to trigger the error.

My sanoid configuration does recursive snapshots on each dataset and zvol described in its configuration file. I don't think I can create a delay between each snapshot without modifying the script. Does it mean that it's not recommended to do a recursive snapshots within an encrypted dataset? No - that's ok. The point is just, that without creating or deleting snapshots, the error is not triggered (my experience)

If I understand you correctly sending raw encrypted zfs might help to avoid the issue with inconsistent list of snapshots? that's correct. Do raw encrypted sends only and you will never see this error. It's not possible to switch from unencrypted to raw encrypted incremental receives on your destination. So you have to do full raw encrypted sends before you can do incremental sends.

wohali commented 10 months ago

mheubach said:

In my experience this happens only if you have:

statistically sufficient write activity within your encrypted datasets or zvols

decrypted zfs sends at high intervals

regular creations and deletions of snapshots whilst doing zfs send

In other words, an active system. In our company's opinion, the functionality is wholly unfit for purpose outside of home labs and toy instances where you can withstand downtime.

siilike said:

You're fine to use it in production if: ... b) you are willing to restart your server every few days, especially if you are running Docker on zfs which quickly becomes unusable.

On large enough servers, such as ours, with reboots taking upwards of 6 hours, and massive workload, that's unacceptable in production. Feel free to read our horror story from 2021. And one note: this occurred for us on both spinning media as well as solid state storage.

So no, rjycnfynby, this isn't fixed, and there isn't even a 'good lead' as to where the problem resides either. I suspect this is because generating sufficient load for reproduction isn't something the active devs easily can manage in their setups -- we certainly couldn't take our machine out of prod and give them unfettered access for weeks to diagnose.

My recommendation is to rely on SED (Self-Encrypting Drives) for encryption at rest, and move on.

rbrewer123 commented 10 months ago

I experienced the snapshot errors on my home desktop system for years. I don't even use that machine very much, so it was completely idle over 23h per day.

It was an AMD machine running NixOS with 2 7200 RPM consumer HDDs in a ZFS mirrored pair with ZFS native encryption. I had pyznap configured to take snapshots every 15 minutes. Once a day, pyznap would send a single daily snapshot to my backup pool, which was a second pair of mirrored HDDs with ZFS native encryption.

Despite the machine being idle all day long, it accumulated 1-2 errored snapshots per day on the main pool. The backup pool never got any errors. Destroying the offending snapshots followed by multiple rounds of scrubs would sometimes fix the problem, sometimes not. But the errored snapshots always caused the ZFS send to the backup pool to fail, which meant my daily backups were often not performed.

I replaced the main pool HDDs with a single NVMe drive several months ago and opted not to use ZFS native encryption on the new pool. pyznap still takes snapshots every 15 minutes and sends them to the ZFS-encrypted backup pool. I haven't experienced any snapshot errors since changing that main pool to not use encryption.

Seeing how this problem has remained for years, and considering the other recent data corruption bug has caused me to really consider whether the bells and whistles of ZFS are worth the risk.

grahamperrin commented 9 months ago

From https://github.com/openzfs/zfs/issues/11688#issuecomment-1916910483:

… this isn't fixed, and there isn't even a 'good lead' as to where the problem resides either. …

@wohali your spec in 2021 included:

TrueNAS 12.0-U5 (FreeBSD 12.2 + OpenZFS 2.0.5)

What now? (Since FreeBSD stable/12 is end of life.)

https://www.truenas.com/software-status/

wohali commented 9 months ago

We are always on the latest released TrueNAS Core. Right now that's FreeBSD 13.1, but with the next patch release it will be 13.2.

muay-throwaway commented 9 months ago

My recommendation is to rely on SED (Self-Encrypting Drives) for encryption at rest, and move on.

@wohali Prior research has found that hardware-based encrypted disks very widely have serious vulnerabilities that allow the encryption to be bypassed (e.g., having master passwords or incorrectly implemented cryptographic protocols) (1, 2, 3). While many of these may be fixed now, this is difficult to verify. Software-based encryption offers the advantage of being verifiable. For Linux, LUKS is a widely accepted choice and does not suffer from the same stability issues of ZFS native encryption.

wohali commented 9 months ago

@muay-throwaway Throwaway is right. I did not ask for your advice or approval, nor can you help resolve this specific issue. Further, all three of your references refer to the exact same 2 CVEs from 2018.

Kindly leave this issue to those who are directly impacted or directly trying to solve the problem, rather than sea lion in from nowhere. Thank you.

openzfs / zfs