Open kreuter opened 3 years ago
Hey @kreuter - this (IMHO excellent!) bug report will probably gain more notice if filed against the openzfs
repo in this project - that's where all the activity is - this particular repo hasn't been touched since 2020...
Super hoping we can get this issue looked at!
Oh hey, I'll move it over and take a look
It seems to be generated by
1 129703 zfs_setattr:return 84 nfsd
1 129758 zfs_vnop_setattr:return 84 nfsd
1 232287 vnode_setattr:return 84 nfsd
84 would be EOVERFLOW
. At a guess from:
/*
* Verify timestamps doesn't overflow 32 bits.
* ZFS can handle large timestamps, but 32bit syscalls can't
* handle times greater than 2039. This check should be removed
* once large timestamps are fully supported.
*/ */
if (mask & (ATTR_ATIME | ATTR_MTIME)) {
if (((mask & ATTR_ATIME) &&
TIMESPEC_OVERFLOW(&vap->va_atime)) ||
((mask & ATTR_MTIME) &&
TIMESPEC_OVERFLOW(&vap->va_mtime))) {
ZFS_EXIT(zfsvfs);
return (SET_ERROR(EOVERFLOW));
}
}
Seems to be a relic from the past, and other platforms. Taking out this test changes the problem to
# (2) Try creating with (O_WRONLY|O_CREAT|O_EXCL).
testit -ce 0 "(O_WRONLY|O_CREAT|O_EXCL) failed"
writef: Stale NFS file handle
which is clearly a DIFFERENT error. \o/
I have not got any closer. The first call will fail with ESTALE, but repeat it again, and it works as expected.
The closest section appears to be:
0 229396 nfsrv_rephead:entry
0 229396 nfsrv_rephead:entry
1 229187 nfsrv_setattr:entry
1 259014 mac_vnode_check_open:entry
1 259015 mac_vnode_check_open:return 2 nfsd
kernel.development`nfsrv_setattr+0x7c6
kernel.development`nfssvc_nfsd+0xbdc
kernel.development`nfssvc+0x106
kernel.development`unix_syscall64+0x2ba
kernel.development`hndl_unix_scall64+0x16
1 229396 nfsrv_rephead:entry
1 229188 nfsrv_setattr:return 0 nfsd
Where mac_vnode_check_open
returns ENOENT. It is not related to cache_lookup()
as that works as expected, however, NFS does have internal caching as well.
More precisely:
0 229187 nfsrv_setattr:entry
0 259014 mac_vnode_check_open:entry
0 259015 mac_vnode_check_open:return 2 nfsd
kernel.development`nfsrv_setattr+0x7c6
kernel.development`nfssvc_nfsd+0xbdc
kernel.development`nfssvc+0x106
kernel.development`unix_syscall64+0x2ba
kernel.development`hndl_unix_scall64+0x16
0 229396 nfsrv_rephead:entry
0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef
0: 46 00 00 00 F...
Where rephead prints out struct nfsrv_descript *)nd)->nd_repstat
(offset 0x88) which is the return code
NFSD is sending to NFS client, in this case 0x46 -> 70 (ESTALE).
ff358d0f890745a79fc23ce437a717ffccd6091f
OK that was quite tricky, but I think I have it doing the right thing. If someone can check with this build: https://www.lundman.net/OpenZFSonOsX-2.1.99-Catalina-10.15.pkg
should work for macOS Catalina or newer. But not on M1.
Super rudimentary test, nothing as detailed as you're both running. Test environment is a pair of VMware VM's running macOS 12.1 with a zpool configured with two mirrored vdevs.
edit: I forgot to create a mirror on the VM with the newer version. I can recreate and re-test if we think that skews our results.
tl;dr: Looks like it's working! I can test on an M1 Mac also ...
Results:
brad $ df -h | grep 192
192.168.0.208:/Volumes/zfs210/export 19G 2.6M 19G 1% /Volumes/export
192.168.0.209:/Volumes/zfs2199/export 38G 2.6M 38G 1% /Volumes/export-1
brad $ ssh -t 192.168.0.208 sudo /usr/local/zfs/bin/zfs --version
(brad@192.168.0.208) Password:
Password:
zfs-macOS-2.1.0-1
zfs-kmod-2.1.0-1
Connection to 192.168.0.208 closed.
brad $ ssh -t 192.168.0.209 sudo /usr/local/zfs/bin/zfs --version
Warning: Permanently added '192.168.0.209' (ED25519) to the list of known hosts.
(brad@192.168.0.209) Password:
Password:
zfs-macOS-2.1.99-105_g78b0e397ec
zfs-kmod-2.1.99-105_g78b0e397ec
Connection to 192.168.0.209 closed.
## Copy a file to ZFS 2.1.0 exported over NFS
brad $ sudo cp -v ISO/FreeBSD-13.0-RELEASE-amd64-disc1.iso /Volumes/export/
'ISO/FreeBSD-13.0-RELEASE-amd64-disc1.iso' -> '/Volumes/export/FreeBSD-13.0-RELEASE-amd64-disc1.iso'
cp: cannot create regular file '/Volumes/export/FreeBSD-13.0-RELEASE-amd64-disc1.iso': Input/output error
## Copy the same file to ZFS 2.1.99 over NFS
brad $ sudo cp -v ISO/FreeBSD-13.0-RELEASE-amd64-disc1.iso /Volumes/export-1/
'ISO/FreeBSD-13.0-RELEASE-amd64-disc1.iso' -> '/Volumes/export-1/FreeBSD-13.0-RELEASE-amd64-disc1.iso'
## md5sum is moot for the 2.1.0 example (file is zero size) but emphasizes the file copied correctly to the 2.1.99 mount
# Original file for comparison
brad $ md5sum ISO/FreeBSD-13.0-RELEASE-amd64-disc1.iso
8db6acda770626944731232a323e9bd9 ISO/FreeBSD-13.0-RELEASE-amd64-disc1.iso
# ZFS 2.1.0
brad $ sudo md5sum /Volumes/export/FreeBSD-13.0-RELEASE-amd64-disc1.iso
d41d8cd98f00b204e9800998ecf8427e /Volumes/export/FreeBSD-13.0-RELEASE-amd64-disc1.iso
# ZFS 2.1.99
brad $ sudo md5sum /Volumes/export-1/FreeBSD-13.0-RELEASE-amd64-disc1.iso
8db6acda770626944731232a323e9bd9 /Volumes/export-1/FreeBSD-13.0-RELEASE-amd64-disc1.iso
OK that was quite tricky, but I think I have it doing the right thing. If someone can check with this build: https://www.lundman.net/OpenZFSonOsX-2.1.99-Catalina-10.15.pkg
should work for macOS Catalina or newer. But not on M1.
@lundman I have this issue as well, would it be possible to generate an M1/Monterey build? I would be happy to test it.
I just did a build for M1, but someone reported it panics quickly without use, need to look into this first.
When accessing an OpenZFS-on-OSX file system via an NFSv3 mount on an OSX client, creating a new file for writing by calling
open(2)
with(O_WRONLY|O_CREAT|O_EXCL)
fails and sets errno toEIO
("Input/output error"). Also, after the failingopen(2)
call, a file can be found to exist (with all its permission bits zeroed).Further, when the NFSv3 client is Linux or FreeBSD, the same failure occurs for the combination of
(O_WRONLY|O_CREAT|O_EXCL)
. Additionally, when using different combinations ofopen(2)
flags that create a new file, writing a small amount of data to the newly created file appears never to store data in the file, but always eventually fails and sets errno to the error with message "Permission denied".Reproduction steps below. The identical reproduction does not show any unexpected failures when NFSv3 mounting an exported APFS file system from the same OSX host.
Expected behavior: when a pathname does not name an existing file, I'd expect opening with
O_CREAT|O_EXCL
to succeed (modulo permission checks, inode/storage availability, and so forth). To the best of my knowledge, there are no relevant permissions restrictions or resource exhaustion issues in my experiments.Additional desired behavior: given the nature of networked file systems, I'm comfortable with the idea that file handles can become stale or otherwise unusable in case of network outage, server restarts, host reboots, et al. To the best of my knowledge, none of those is occurring during in my experiments. So ISTM that writing to newly created files' handles ought to work when nothing else shows signs of failure.
I'd be happy to provide further information and/or run any further reproductions that might be helpful here. (I'll confess, I'm out of practice at capturing NFS network traffic. In case that's needed, please point me in the direction of the best way to get it nowadays.)
NFS server host information
In all cases, I'm running
nfsd
on a system with the following OS and zfs extension info:NFSv3 clients
I've exercised the reproduction using the following NFSv3 clients:
2a. a different OSX system (running a very vanilla Catalina installation); this is that host's system info is:
2b. a fairly vanilla FreeBSD 13.0 system running on another host on the local network. Here's some info about that system:
2c. a Linux system, running as a QEMU guest of the OSX running
nfsd
. Here's some info about that Linux system:Software setup
3a. On the OSX NFS server, first create an APFS export, then an OpenZFS-on-OSX export. My local network is 10.0.0.0/24, and my normal unprivileged user account's uid on my network is 1001, so I'll set the uid of the exported directory to that uid.
3b. Setup on the NFS client. This appears to work equivalently across OSX, Linux, FreeBSD.
File writing program
I tried to come up with a reproduction using only shell-level utilities, but the shell and utilities are inconsistent as to whether, when, and how honestly they report syscall errors (certain utilities produce error messages different from what syscalls set errno to, it seems). So here's a file writer. It takes a pathname and a string to write into the file; and its
open(2)
flags are configured by option switches. This can be compiled withcc -o writef writef.c
.Driver for writef
This tests handful of
open(2)
flag combinations. It expects thewritef
binary to be in the current working directory. Note that in all cases, the functiontestit
ensures that there's no file at the specified path, so theopen(2)
inwritef
should always be trying to create a new file. If this file is atrun-test.sh
besidewritef
, then it can be run as, e.g.,sh ./run-test.sh /tmp/apfs-mount/test.out
andsh ./run-test.sh /tmp/zfs-mount/test.out
Expected Results
These are the results when running on an NFS-mounted APFS export. These are the expected results of the test script, including the error message in case (3) (i.e., attempting to open an existing file with
O_CREAT|O_EXCL
).Results when running on an NFS-mounted ZFS export
The following are the results when the NFSv3 client is an OSX host. Case (2) is the unexpected result.
The following are the results when the NFSv3 client is a FreeBSD host; the results on a Linux host are identical (modulo timestamps, of course).
Note that case (2) shows the same unexpected behavior as for an OSX NFSv3 client; cases (3) through (5) all demonstrate failure to write to the newly created file. (I observe that in cases (3) through (5), even after the
close(2)
inwritef
fails, the file's size shows up as 29 bytes fromls
. I'm not sure if that size is coming from the NFS server or the NFS client, so the number might be a red herring.)