Closed phiser678 closed 2 years ago
as far as removing the corruption i am not too sure other than recreating the filesystem but there have been several causes of corruption like this.
Thanks for finding the other issues!
Hmmm, interesting, issue https://github.com/openzfs/zfs/issues/10019 shows almost the exact hardware, I'm using a Xeon E3-1226 v3 and also uses avx2 when I check with cat /proc/spl/kstat/zfs/vdev_raidz_bench
on both the older system with 0.7.5 and the newer with 0.8.3. I don't have encryption on the datasets.
The other issues involved are with send/recv, but this is not the case. The original zpool created an input/output error, which is just propagated to the new zpools. It could not detect there is a fault when sending the zfs dataset, which is what I experience when I can just read everything with the zdb commands. I have hole_birth
enabled, but both datasets are equal as far as I see.
There was also no sudden power failure since everything is handled by UPS power devices.
Is there no feature in zfs which we can add that can detect a input/output error and refuses to send the zfs stream when this happens? This way we could avoid the silent corruption!
Is there a way to delete a used zfs blocks directly with some command? Although I agree this could be very dangerous.
Indeed recreate the dataset and all the snapshots would be a possibility since it only involves the snapshots of a couple of months ago. So, this is, as you suggest, the only way to get rid of this input/output error?
Thanks again for any update.
If it's a any help , I got the same issue on a old Intel i5-5200U using zfs 0.8.3-1ubuntu12.3 on my laptop.
The only solution to fix my problem , was delete all the snapshots newer than a certain date , and zpool scrub , zpool clear a couple of times. No warning in dmesg. What frighten me most is zfs happily continues taking snapshot without any warnings.
I am getting the same problem; not sure who can help. Issue appears in Freenas (Freebsd 12) using openzfs 2.0; problem was confirmed using SystemRescue (https://github.com/nchevsky/systemrescue-zfs) that is using a more recent version of zfs. This sound more a zfs issue. The impacted files are read-only (I never changed them, should have been part of the pool for at least one year). The files (about 17,000) are all part of the same dataset (mainly .jpg files but not all while the dataset contains mostly music)
I have many files that I can't see (cat, less, ... reports: Input/output error); ls is reporting the file howver. I replaced a disk recently, not sure if I should try a resilver or if it was the reason of the problem. I checked old snapshots, and the problem is also there. I tried to see the content of the file using zdb and it seems successful
cp returns: Bad Address, and dmesg reports the error (nothing for cat, ....):
vm_fault: pager read error
Pool is healthy, scrub did not report any issue.
zfs send
returns also Input/output error; easy way to spot the problem.
I opened a new issue as it relates also to the new open ZFS 2.0: https://github.com/openzfs/zfs/issues/11443#issue-781928612
@shuther what did you do with systemrescue to confirm the issue? I'm not clear on whether you have reproduced the same issue using systemrescue or have used it to confirm that a different version of ZFS is unaffected.
I can confirm that cat
At least you could detect the corruption with zfs send/recv, but for me this was silent also! I managed to copy the corrupted files from the person who still had it in his local directory. I cannot delete the corrupted files on any of the copy zpools and I just renamed them to a hidden directory. So, this is really bad for zfs it's "rock solid" reputation! On 1 storage system I moved our strategy. 1 zfs send/recv on another system and back to a rsync backup copy on a 3th zpool! Of course the rsync takes hours compaired to zfs recv/send but if you have these kind of errors, what else we can do?
If there is a pool which passes zfs scrub and gives io error on read without any dmesg traces. I wonder if it would be nice to identify the code path of such an io error with a stack. I would imagine there should not too many places in the code which produce this error and somebody could instrument them to dump a stack trace in dmesg for debugging this issue. This could have greatly speed up the investigation of this scary bug. IMHO
This should be relatively easy to investigate, since you can recreate it across send/receive. Can you make a clone and remove everything except the problematic directory from it, and then send this? And make the stream available somewhere?
Or if this makes the problem go away, at least you can then start tracking down what it is that makes it go away.
That is possible. I will have a look, but it will take some time to make it. The dataset where the corrupted data is, is 9TB, so removing files from there in a clone will probably take some time. Next week I will post an update, if we can trap it in a clone stream. Thank you for your help!
I managed to narrow the possible bug in the zfs dataset. It seems the dataset is very picky on what system it can run. Here are the tests I did:
root@backup:~# modinfo zfs|grep version
version: 0.8.3-1ubuntu12.4
srcversion: 4528C7B15E59A789E01F814
root@backup:~# ssh ipifs zfs send -c tank/bad@readme|zfs recv tank/bad
root@backup:~# ls /tank/bad/
video_analysis
root@backup:~# ls /tank/bad/video_analysis/uav_london-input-output-error/
ls: cannot open directory '/tank/bad/video_analysis/uav_london-input-output-error/': Input/output error
root@backup:~# zfs destroy -r tank/bad
root@backup:~# ssh ipifs zfs send tank/bad@readme|zfs recv tank/bad
root@backup:~# ls /tank/bad/video_analysis/uav_london-input-output-error/
ls: cannot open directory '/tank/bad/video_analysis/uav_london-input-output-error/': Input/output error
On an other newer system which does work:
root@testpc1:~# ssh ipifs zfs send -c tank/bad@readme|zfs recv tank/bad
root@testpc1:~# ls -l /tank/bad/
total 1
drwxrwx--- 3 1081 605 3 Jan 18 19:24 video_analysis
root@testpc1:~# ls -l /tank/bad/video_analysis/
total 1
drwxr-xr-x 2 760 root 3 Jan 18 20:26 uav_london-input-output-error
root@testpc1:~# ls -l /tank/bad/video_analysis/uav_london-input-output-error/
total 1
-rw-r--r-- 1 760 605 98 Apr 5 2020 README.txt
root@testpc1:~# modinfo zfs|grep version
version: 0.8.3-1ubuntu12.5
srcversion: 4528C7B15E59A789E01F814
root@testpc1:~# cat /tank/bad/video_analysis/uav_london-input-output-error/README.txt
video captured during the covid-19 crisis over the city of London.
Live feed produced by Reuters
However, I tested another machine, with the same(!) zfs version as the one that is working:
root@ridzo:~# ssh root@ipifs zfs send -c tank/bad@readme|zfs recv pool/bad
root@ridzo:~# ls -l /pool/bad/
total 1
drwxrwx---+ 3 1081 605 3 Jan 18 19:24 video_analysis
root@ridzo:~# ls -l /pool/bad/video_analysis
ls: /pool/bad/video_analysis/uav_london-input-output-error: Input/output error
total 1
drwxr-xr-x 2 760 root 3 Jan 18 20:26 uav_london-input-output-error
root@ridzo:~# ls -l /pool/bad/video_analysis/uav_london-input-output-error
ls: /pool/bad/video_analysis/uav_london-input-output-error: Input/output error
ls: cannot open directory '/pool/bad/video_analysis/uav_london-input-output-error': Input/output error
root@ridzo:~# modinfo zfs|grep version
version: 0.8.3-1ubuntu12.5
srcversion: 4528C7B15E59A789E01F814
I also tested an older zfs system, which also worked:
root@testpc3:~# ssh ipifs zfs send -c tank/bad@readme|zfs recv -v pool/bad
receiving full stream of tank/bad@readme into pool/bad@readme
received 5.09M stream in 1 seconds (5.09M/sec)
root@testpc3:~# ls -l /pool/bad/video_analysis/uav_london-input-output-error/README.txt
-rw-r--r-- 1 760 605 98 Apr 5 2020 /pool/bad/video_analysis/uav_london-input-output-error/README.txt
root@testpc3:~# modinfo zfs|grep version
version: 0.8.1-1ubuntu14.3
srcversion: F3C94D5226BB5E654A00EF1
root@testpc3:~# modinfo spl|grep version
version: 0.8.1-1ubuntu14.3
srcversion: 9B21F4F344A05823B8DB47A
It's very random where it works and where not. I included the dataset as a stream here:
$ gzip -dc zfs-bad-input-output.zfs.gz|zfs recv "yourpool"/bad
I did a strace on the bad system when I do an ls
on the bad directory, it says:
getxattr("/tank/bad/video_analysis/uav_london-input-output-error", "system.posix_acl_access", NULL, 0) = -1 EIO (Input/output error)
The good system says:
getxattr("/tank/bad/video_analysis/uav_london-input-output-error", "system.posix_acl_access", NULL, 0) = -1 EOPNOTSUPP (Operation not supported)
It seems it's some getxattr that gives different results. I tried getfattr
on both systems, both result in empty output, but the bad systems gives something extra in the strace:
listxattr("/tank/bad/video_analysis/uav_london-input-output-error", "system.posix_acl_access\0system.p"..., 256) = 49
Tune in a bit deeper, good system:
# zfs get acltype tank
NAME PROPERTY VALUE SOURCE
tank acltype off default
bad system:
# zfs get acltype tank
NAME PROPERTY VALUE SOURCE
tank acltype posixacl local
We do use ACL's on our systems, so we do need them, I cannot just get rid of them. I normally set:
zfs set acltype=posixacl tank
zfs set xattr=sa tank
But I can confirm, the systems where I don't get input/output errors are the ones where I don't have acltype set. Any clue's now how to fix?
PS. resetting acls does not work also:
# setfacl -b -R /tank/bad/video_analysis
setfacl: /tank/bad/video_analysis/uav_london-input-output-error: Input/output error
setfacl: /tank/bad/video_analysis/uav_london-input-output-error: Input/output error
Cool, it is easily reproducible with your stream (receive into a pool/dataset with default settings => works / receive into a pool/dataset with acltype=posixacl => error).
With
filename: /lib/modules/5.8.18-300.fc33.x86_64/extra/zfs/zfs/zfs.ko
version: 0.8.5-1
On my dataset, the issue does "not" seem to be connected to this ACL, because I am using the same settings across the pool and only one dataset is impacted by such a corruption. ACL is the same (as it looks) for the good and bad files. I will try to run a strace today if I can get anything (so we know if it is a similar or a different problem)
root@freenas:~ # zfs get xattr,aclmode,acltype voluz/media/music
NAME PROPERTY VALUE SOURCE
voluz/media/music xattr off inherited from voluz
voluz/media/music aclmode passthrough inherited from voluz
voluz/media/music acltype nfsv4 default
@shuther FreeBSD and Linux use different ACL systems. Indeed, previously we had nfsv4 ACL's on our datasets from FreeBSD, but when I transferred them to Linux, it was incompatible and I had to recreate the ACL's (posix) for the Linux system. Notice for this bug, the fault is propagated when have the acltype on. So, this stream I added above will not do much on FreeBSD systems. Your should be very careful with systemrescue-zfs, since this is Linux based and has different ACL types. You should look for a FreeBSD rescue system instead.
My idea was to temporary disable acltype, then I can delete (or copy first) the error directory and put acltype back on again. Would it delete all the ACL's of the complete dataset with this operation? Next is to try this on systemrescuecd-zfs which indeed has the 2.0 branch.
I can confirm the bug persists in the 2.0 branch also!
root@sysrescue ~]# ls -l /pool/bad/
total 1
drwxrwx---+ 3 1081 605 3 Jan 18 18:24 video_analysis
[root@sysrescue ~]# ls -l /pool/bad/video_analysis/
ls: /pool/bad/video_analysis/uav_london-input-output-error: Input/output error
total 1
drwxr-xr-x 2 760 root 3 Jan 18 19:26 uav_london-input-output-error
[root@sysrescue ~]# ls -l /pool/bad/video_analysis/uav_london-input-output-error/
ls: /pool/bad/video_analysis/uav_london-input-output-error/: Input/output error
ls: cannot open directory '/pool/bad/video_analysis/uav_london-input-output-error/': Input/output error
[root@sysrescue ~]# modinfo zfs|grep version
version: 2.0.0-1
srcversion: 3A54AFFBC84534A6E7FF55C
Now, try to check if we lose previous ACL's when disabling posixacl:
root@ridzo:/pool/bad# echo ok >error
root@ridzo:/pool/bad# setfacl -m u:phiser678:rwx error
root@ridzo:/pool/bad# ls -l
total 1
-rw-rwx---+ 1 root root 3 Jan 19 10:58 error
drwxrwx---+ 3 1081 605 3 Jan 18 19:24 video_analysis
root@ridzo:/pool/bad# getfacl error|grep user
user::rw-
user:phiser678:rwx
root@ridzo:/pool/bad# zfs set acltype=noacl pool/bad
root@ridzo:/pool/bad# getfacl error|grep user
user::rw-
root@ridzo:/pool/bad# cp -a video_analysis video_analysis-ok
root@ridzo:/pool/bad# zfs set acltype=posixacl pool/bad
root@ridzo:/pool/bad# getfacl error|grep user
user::rw-
user:phiser678:rwx
root@ridzo:/pool/bad# ls -l video_analysis-ok/uav_london-input-output-error/README.txt
-rw-r--r-- 1 760 605 98 Apr 5 2020 video_analysis-ok/uav_london-input-output-error/README.txt
Now I try to delete the input-output error:
root@ridzo:/pool/bad# zfs set acltype=noacl pool/bad
root@ridzo:/pool/bad# rm -rf video_analysis
root@ridzo:/pool/bad# mv video_analysis-ok video_analysis
root@ridzo:/pool/bad# zfs set acltype=posixacl pool/bad
root@ridzo:/pool/bad# ls -l
total 1
-rw-rwx---+ 1 root root 3 Jan 19 10:58 error
drwxrwx--- 3 1081 605 3 Jan 18 19:24 video_analysis
root@ridzo:/pool/bad# ls -l video_analysis/uav_london-input-output-error/README.txt
-rw-r--r-- 1 760 605 98 Apr 5 2020 video_analysis/uav_london-input-output-error/README.txt
Works! All files recovered and ACL's retained. No more input-output errors. Back to rock-solid ZFS again! :-)
This should be checked and send upstream. In case you missed the small stream in the lenghty report: zfs-bad-input-output.zfs.gz
So, from the stream, I see that
/video_analysis/uav_london-input-output-error/
Dataset brokenpool/brokenfs [ZPL], ID 389, cr_txg 11, 2.13M, 2074 objects, rootbp DVA[0]=<0:8048e00:200> DVA[1]=<0:18052a00:200> [L0 DMU objset] fletcher4 lz4 unencrypted LE contiguous unique double size=1000L/200P birth=614L/614P fill=2074 cksum=161558d1d1:6c21d6ed532:12cf12503a5c9:260b771f9953a4
Object lvl iblk dblk dsize dnsize lsize %full type
1525568 1 128K 512 0 512 512 0.00 ZFS plain file (K=inherit) (Z=inherit=uncompressed)
168 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED
dnode maxblkid: 0
path /video_analysis/uav_london-input-output-error/
This with the caveat that I didn't manage to reproduce from userspace, but didn't try very hard.
(apologies, but I will not be sending a pull request with a proposed fix)
Looks like that /video_analysis/uav_london-input-output-error/
# zdb -vvvvvv -ddddd brokenpool/brokenfs 1525568
Dataset brokenpool/brokenfs [ZPL], ID 389, cr_txg 11, 2.13M, 2074 objects, rootbp DVA[0]=<0:8048e00:200> DVA[1]=<0:18052a00:200> [L0 DMU objset] fletcher4 lz4 unencrypted LE contiguous unique double size=1000L/200P birth=614L/614P fill=2074 cksum=161558d1d1:6c21d6ed532:12cf12503a5c9:260b771f9953a4
Object lvl iblk dblk dsize dnsize lsize %full type
1525568 1 128K 512 0 512 512 0.00 ZFS plain file (K=inherit) (Z=inherit=uncompressed)
168 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED
dnode maxblkid: 0
path /video_analysis/uav_london-input-output-error/<xattrdir>/system.posix_acl_access
uid 760
gid 605
atime Sun Apr 5 16:32:12 2020
mtime Sun Apr 5 19:48:15 2020
ctime Sun Apr 5 19:48:15 2020
crtime Sun Apr 5 16:32:12 2020
gen 10088303
mode 100644
size 0
parent 1525566
links 1
pflags 40800000005
Indirect blocks:
Speculation by reading the code on the web browser: zpl_set_acl if given a NULL pointer in the acl parameter, will happily write a zero-sized xattr. https://github.com/openzfs/zfs/blob/1c2358c12a673759845f70c57dade601cc12ed99/module/os/linux/zfs/zpl_xattr.c#L980-L991 The kernel seems also willing to pass such NULL pointer if userspace writes an empty value. https://github.com/torvalds/linux/blob/fcadab740480e0e0e9fa9bd272acd409884d431a/fs/posix_acl.c#L860-L896 However, zpl_get_acl will return -EIO if it reads a zero-sized xattr : https://github.com/openzfs/zfs/blob/1c2358c12a673759845f70c57dade601cc12ed99/module/os/linux/zfs/zpl_xattr.c#L1045-L1047
I couldn't manage to reproduce from userspace with purposely-wrong calls to setxattr or libacl, but tried only for a few minutes.
(apologies in advance, I will not be sending a pull request with a proposed fix)
So I can confirm I am facing a different issue. Using SystemRescue+ZFS, I cam read partially a file (using cat - it is a picture), and I face the input/output error in the middle of the file (so not ACL related).
Still nothing reported in dmesg or zpool status.
I tried also an strace using strace file xxx
Thanks for pointing this out @maxximino ! The ACL is set like this and normally is inherited from the shared folder:
setfacl -m group:ipi:rwx -d -m group:ipi:rwx /shared
This is a 9TB dataset, this zero ACL only happened on the uav_londen directory. It looks like this is a very rare case then?
As usual, the catch is just one layer below where I stopped looking. sending the NULL on the value is the expected way to remove the xattr: https://github.com/openzfs/zfs/blob/1c2358c12a673759845f70c57dade601cc12ed99/module/os/linux/zfs/zpl_xattr.c#L482-L488 ... now I don't have anymore a clue of a around-the-ACLs code path that could mistakenly trigger this.
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
System information
Describe the problem you're observing
I have an input/output error on a directory on a raidz2 zfs filesystem with lz4 compression, but there is no sign of corruption of the disks and it is not detected by scrub.
The error is propagated in the snapshots and the zfs send/recv streams as as well. The original is on a Ubuntu 18.04 with 0.7.5 zfs version which I transferred to a new Ubuntu 20.04 with zfs version 0.8.3. I will be keeping the update only, so I want to delete the bad I/O error directory on the new Ubuntu 20.04 system. The new system uses LVM partitions, which indeed could be the problem, but the original Ubuntu 18.04 has raw disks without LVM and has this fault propagated to the new Ubuntu 20.04. They both have the same behaviour.
I can still read the contents with zdb and extract the contents of the files correctly. I managed to recover the files, but I cannot delete the directory and free the space!
Both .ts streams can be read and reconstructed as well. As far as I can tell, this is the only input/output error directory on a 35TB zpool which I have detected. I doubled checked both systems with md5 checksums which pointed me to this abnormally. If I did not checked both systems I would have not known the error, so that's why I put
silent
in the title and could be possible with numerous zpool systems. Both systems have ECC memory.How do I free up the space? But before I do, can I run some tests to find the cause or even better fix the abnormally in case others have this as well?
Describe how to reproduce the problem
Include any warning/errors/backtraces from the system logs
No errors in the system logs.