winfsp / cgofuse

Cross-platform FUSE library for Go - Works on Windows, macOS, Linux, FreeBSD, NetBSD, OpenBSD
https://winfsp.dev
MIT License
533 stars 86 forks source link

Mixup of O_EXCL and O_TRUNC constant values CGO vs non CGO #66

Open ncw opened 2 years ago

ncw commented 2 years ago

If you compile and run this program

package main

import (
    "fmt"

    "github.com/winfsp/cgofuse/fuse"
)

func main() {
    fmt.Printf("fuse.O_APPEND = 0x%04x\n", fuse.O_APPEND)
    fmt.Printf("fuse.O_CREAT  = 0x%04x\n", fuse.O_CREAT)
    fmt.Printf("fuse.O_EXCL   = 0x%04x\n", fuse.O_EXCL)
    fmt.Printf("fuse.O_TRUNC  = 0x%04x\n", fuse.O_TRUNC)
}

It prints this

Z:\go\src\github.com\rclone\rclone\cgofusetest>go run test.go
fuse.O_APPEND = 0x0008
fuse.O_CREAT  = 0x0100
fuse.O_EXCL   = 0x0400
fuse.O_TRUNC  = 0x0200

However if you disable CGO and compile it you get a different answer with O_EXCL and O_TRUNC swapped over.

Z:\go\src\github.com\rclone\rclone\cgofusetest>set CGO_ENABLED=0

Z:\go\src\github.com\rclone\rclone\cgofusetest>go run test.go
fuse.O_APPEND = 0x0008
fuse.O_CREAT  = 0x0100
fuse.O_EXCL   = 0x0200
fuse.O_TRUNC  = 0x0400

I believe the non CGO version is correct, at least that agrees with what I receive from WinFSP. It is defined here

https://github.com/winfsp/cgofuse/blob/f0ad03169e0f07409feb7e3db95fba57fab5cfc6/fuse/fsop_nocgo_windows.go#L107-L108

whereas the CGO version is defined here - this also looks correct

https://github.com/winfsp/cgofuse/blob/f0ad03169e0f07409feb7e3db95fba57fab5cfc6/fuse/fsop_cgo.go#L249-L250

So something a little wacky is going on!

This is compiling on Windows 10 with go1.18.1 and gcc 8.1.0 provided by MinGW using latest released cgofuse v1.5.0 and latest stable WinFSP (2022 installed from 1.10.220006.msi).

This was originally reported in the rclone forum here: https://forum.rclone.org/t/cannot-copy-files-to-mounted-azure-storage-windows/30092 if you want some background.

I suspect that the problem might be with the WinFSP include files, but I couldn't figure out where the constants were defined.

You said before in https://github.com/winfsp/cgofuse/issues/19#issuecomment-344405666 where we discussed a very similar issue

WinFsp-FUSE supports both the MSVC and Cygwin environments. In MSVC O_CREAT|O_RDWR translates to 0x0102. In Cygwin it translates to 0x0202. Cgofuse uses the MSVC environment, which is the reason you see 0x0102.

So I wonder if I'm including the Cygwin version of the fuse header files or something like that? This is the include I use

CPATH=C:\Program Files\WinFsp\inc\fuse;C:\Program Files (x86)\WinFsp\inc\fuse

Any help gratefully received!

billziss-gh commented 2 years ago

I can confirm that this is a problem. I am investigating why this is happening.

billziss-gh commented 2 years ago

The relevant source in WinFsp is from file fuse_intf.c:

    if ('C' == f->env->environment) /* Cygwin */
        fi.flags = 0x0200 | 0x0800 | 2 /*O_CREAT|O_EXCL|O_RDWR*/;
    else
        fi.flags = 0x0100 | 0x0400 | 2 /*O_CREAT|O_EXCL|O_RDWR*/;

This agrees with the contents of file fcntl.h from the Windows SDK (C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\ucrt\fcntl.h):

#define _O_CREAT       0x0100  // create and open file
#define _O_TRUNC       0x0200  // open and truncate
#define _O_EXCL        0x0400  // open only if file doesn't already exist

It also agrees with file fcntl.h from MinGW (C:\Program Files\mingw-w64\x86_64-8.1.0-win32-seh-rt_v6-rev0\mingw64\x86_64-w64-mingw32\include\fcntl.h):

#define _O_CREAT 0x0100
#define _O_TRUNC 0x0200
#define _O_EXCL 0x0400

It looks to me that the no-cgo version is incorrect. It should read:

    O_TRUNC   = 0x0200
    O_EXCL    = 0x0400

FYI, WinFsp will not send the O_TRUNC flag.

ncw commented 2 years ago

From one of the posts on the rclone forum, this is what rclone received from WinFSP/cgofuse which agrees with the code above.

2022/04/06 15:11:09 DEBUG : /test/testfile: CreateEx: flags=0x502, mode=0770

And this is how rclone interprets it using the cgo flags

2022/04/06 15:11:09 DEBUG : test/testfile: Open: flags=O_RDWR|O_CREATE|O_EXCL

So I think you are saying that is correct and O_EXCL is meant not O_TRUNC?

I guess that brings me on to the question of why is WinFSP sending O_EXCL for a file that might already exist? If using a VFS cache then rclone passes that O_EXCL straight on to the underlying Open which then blows up with file already exists.

This was the issue I was chasing down when I discovered the mismatch between the constants!

billziss-gh commented 2 years ago

The WinFsp native API has an operation Open for opening an existing file and an operation Create for creating a new file. Open is supposed to return an error if the file does not exist and Create is supposed to return an error if the file already exists.

The FUSE layer is implemented on top of the native WinFsp API. The FUSE implementation of Create uses O_EXCL because the file system is supposed to return -EEXIST if the file already exists.

ncw commented 2 years ago

That makes perfect sense... Except for the issue reported in the forum thread...

Here is the rclone log rclone azure mount copy error (from explorer with fuse debug).txt

What appears to happen is that Create is called

rclone[TID=24b4]: FFFF9E86CC94E060: >>Create [UT---C] "\test\testfile", FILE_CREATE, CreateOptions=44, FileAttributes=20, Security=NULL, AllocationSize=0:1825000, AccessToken=000000000000029
8[PID=2350], DesiredAccess=17019f, GrantedAccess=0, ShareAccess=0
2022/04/12 15:46:22 DEBUG : /test: Getattr: fh=0xFFFFFFFFFFFFFFFF
2022/04/12 15:46:22 DEBUG : /test: >Getattr: errc=0
2022/04/12 15:46:22 DEBUG : /test/testfile: CreateEx: flags=0x502, mode=0770
2022/04/12 15:46:22 DEBUG : test/testfile: Open: flags=O_RDWR|O_CREATE|O_EXCL
2022/04/12 15:46:22 DEBUG : test/testfile: newRWFileHandle:
2022/04/12 15:46:22 DEBUG : test/testfile(0xc000c1a080): openPending:
2022/04/12 15:46:22 DEBUG : test/testfile: vfs cache: truncate to size=0
2022/04/12 15:46:22 DEBUG : test: Added virtual directory entry vAddFile: "testfile"
2022/04/12 15:46:22 DEBUG : test/testfile(0xc000c1a080): >openPending: err=<nil>
2022/04/12 15:46:22 DEBUG : test/testfile: >newRWFileHandle: err=<nil>
2022/04/12 15:46:22 DEBUG : test: Added virtual directory entry vAddFile: "testfile"
2022/04/12 15:46:22 DEBUG : test/testfile: >Open: fd=test/testfile (rw), err=<nil>
2022/04/12 15:46:22 DEBUG : /test/testfile: >CreateEx: errc=0, fh=0x0
2022/04/12 15:46:22 DEBUG : /test/testfile: Release: fh=0x0
2022/04/12 15:46:22 DEBUG : test/testfile(0xc000c1a080): RWFileHandle.Release
2022/04/12 15:46:22 DEBUG : test/testfile(0xc000c1a080): close:
2022/04/12 15:46:22 DEBUG : test/testfile: vfs cache: setting modification time to 2022-04-12 15:46:22.8018157 +0000 GMT m=+18.086083401
2022/04/12 15:46:22 INFO  : test/testfile: vfs cache: queuing for upload in 5s
2022/04/12 15:46:22 DEBUG : test/testfile(0xc000c1a080): >close: err=<nil>
2022/04/12 15:46:22 DEBUG : /test/testfile: >Release: errc=0
rclone[TID=24b4]: FFFF9E86CC94E060: <<Create IoStatus=c0000010[0]

Then immediately called again which fails because the file already exists.

rclone[TID=24b4]: FFFF9E86CC94E060: >>Create [UT---C] "\test\testfile", FILE_CREATE, CreateOptions=44, FileAttributes=20, Security=NULL, AllocationSize=0:1825000, AccessToken=0000000000000298[PID=2350], DesiredAccess=170197, GrantedAccess=0, ShareAccess=0
2022/04/12 15:46:22 DEBUG : /test: Getattr: fh=0xFFFFFFFFFFFFFFFF
2022/04/12 15:46:22 DEBUG : /test: >Getattr: errc=0
2022/04/12 15:46:22 DEBUG : /test/testfile: CreateEx: flags=0x502, mode=0770
2022/04/12 15:46:22 DEBUG : test/testfile: Open: flags=O_RDWR|O_CREATE|O_EXCL
2022/04/12 15:46:22 DEBUG : test/testfile: newRWFileHandle:
2022/04/12 15:46:22 DEBUG : test/testfile: >newRWFileHandle: err=file already exists
2022/04/12 15:46:22 DEBUG : test/testfile: File.openRW failed: file already exists
2022/04/12 15:46:22 DEBUG : test/testfile: >Open: fd=<nil *RWFileHandle>, err=file already exists
2022/04/12 15:46:22 DEBUG : /test/testfile: >CreateEx: errc=-17, fh=0xFFFFFFFFFFFFFFFF
2022/04/12 15:46:22 DEBUG : /test/testfile: Getattr: fh=0xFFFFFFFFFFFFFFFF
2022/04/12 15:46:22 DEBUG : /test/testfile: >Getattr: errc=0
rclone[TID=24b4]: FFFF9E86CC94E060: <<Create IoStatus=c0000035[0]

I haven't been able to replicate this but I suspect it might be Windows 11 / WIndows 2022 datacenter behaviour.

So I don't know exactly what is going on here - probably a misunderstanding by me, but I'd appreciate your thoughts!

billziss-gh commented 2 years ago

I notice that the first log ends with:

rclone[TID=24b4]: FFFF9E86CC94E060: <<Create IoStatus=c0000010[0]

The status code c0000010 stands for STATUS_INVALID_DEVICE_REQUEST. Think of it as the equivalent of UNIX ENOSYS. I am not sure why this status code is returned but it has something to do with this failure.

Possible scenario:

ncw commented 2 years ago

Possibilities for things which WinFsp calls into rclone during the create call:

I don't think this is being called as there is no trace from rclone

https://github.com/winfsp/winfsp/blob/2fb2de067cee9eeacce55f4503f6c3afb3163a0f/src/dll/fuse/fuse_intf.c#L985

I don't think this is being called as there is no trace from rclone either

https://github.com/winfsp/winfsp/blob/2fb2de067cee9eeacce55f4503f6c3afb3163a0f/src/dll/fuse/fuse_intf.c#L996

This part of the log

2022/04/12 15:46:22 DEBUG : /test/testfile: Release: fh=0x0
2022/04/12 15:46:22 DEBUG : test/testfile(0xc000c1a080): RWFileHandle.Release
2022/04/12 15:46:22 DEBUG : test/testfile(0xc000c1a080): close:
2022/04/12 15:46:22 DEBUG : test/testfile: vfs cache: setting modification time to 2022-04-12 15:46:22.8018157 +0000 GMT m=+18.086083401
2022/04/12 15:46:22 INFO  : test/testfile: vfs cache: queuing for upload in 5s
2022/04/12 15:46:22 DEBUG : test/testfile(0xc000c1a080): >close: err=<nil>
2022/04/12 15:46:22 DEBUG : /test/testfile: >Release: errc=0
rclone[TID=24b4]: FFFF9E86CC94E060: <<Create IoStatus=c0000010[0]

Implies that release was called, so create was taking the error path here as you suggested.

https://github.com/winfsp/winfsp/blob/2fb2de067cee9eeacce55f4503f6c3afb3163a0f/src/dll/fuse/fuse_intf.c#L1058

Given that ops.create was called (we see that from rclone log) and that ops.chflags and ops.chown weren't (because there is no trace for them in the log,

So either CreateEx caused Result to be set to failure (unlikely since we see it returned OK in the rclone log)

https://github.com/winfsp/winfsp/blob/2fb2de067cee9eeacce55f4503f6c3afb3163a0f/src/dll/fuse/fuse_intf.c#L959-L960

Or this went wrong

https://github.com/winfsp/winfsp/blob/2fb2de067cee9eeacce55f4503f6c3afb3163a0f/src/dll/fuse/fuse_intf.c#L1006-L1007

Or this went wrong

https://github.com/winfsp/winfsp/blob/2fb2de067cee9eeacce55f4503f6c3afb3163a0f/src/dll/fuse/fuse_intf.c#L1026-L1028

What about the last two options - can they fail with STATUS_INVALID_DEVICE_REQUEST?

billziss-gh commented 2 years ago

I agree that cgflags and chown cannot be the problem. Also note that fsp_fuse_intf_Create properly handles a STATUS_INVALID_DEVICE_REQUEST from them.

Re: FspFileSystemEnumerateEa

This could be the problem.

The Windows file creation API is the kitchen sink and then some: the file creation "system call" allows a file to be created together with extended attributes. So this checks if there are extended attributes passed during file creation and calls setxattr if the file system supports it.

The problem is that in practice this Windows file system capability is almost never used, because it is not exposed by the regular Windows API (CreateFileW). So it is unclear to me what regular file system scenario would cause extended attributes to be sent in the Create call. Perhaps I should read the original thread from the rclone forum more closely.

In any case, it might be prudent to add a check for STATUS_INVALID_DEVICE_REQUEST here and treat it as success as we do for chown and chflags.

Re: fsp_fuse_intf_GetFileInfoEx

This issues an fgetattr to the file system to get the final stat data of the created file. So it is not likely that this would generate a STATUS_INVALID_DEVICE_REQUEST return.

billziss-gh commented 2 years ago

Re-reading the code of Create I see that if the file system is passed extended attributes during file creation and setxattr is not implemented, then we are supposed to return STATUS_EAS_NOT_SUPPORTED.

https://github.com/winfsp/winfsp/blob/2fb2de067cee9eeacce55f4503f6c3afb3163a0f/src/dll/fuse/fuse_intf.c#L872-L880

This suggests that if the failure to set extended attributes is indeed the problem, we may not be able to ignore a -ENOSYS error and treat it as success. So unfortunately I do not have a good fix for this if it turns out to be the problem.

ncw commented 2 years ago

Re: FspFileSystemEnumerateEa

This could be the problem.

The Windows file creation API is the kitchen sink and then some: the file creation "system call" allows a file to be created together with extended attributes. So this checks if there are extended attributes passed during file creation and calls setxattr if the file system supports it.

Rclone does indeed return ENOSYS for any of the *xattr calls. If any of those got called they wouldn't get logged as there are no Trace statements.

// Setxattr sets extended attributes.
func (fsys *FS) Setxattr(path string, name string, value []byte, flags int) (errc int) {
    return -fuse.ENOSYS
}

// Getxattr gets extended attributes.
func (fsys *FS) Getxattr(path string, name string) (errc int, value []byte) {
    return -fuse.ENOSYS, nil
}

// Removexattr removes extended attributes.
func (fsys *FS) Removexattr(path string, name string) (errc int) {
    return -fuse.ENOSYS
}

// Listxattr lists extended attributes.
func (fsys *FS) Listxattr(path string, fill func(name string) bool) (errc int) {
    return -fuse.ENOSYS
}

I'll add Trace statements and get the user to test again and we can see if your theory is correct.

billziss-gh commented 2 years ago

Please let me know.

In the meantime I will fix the mixed up values for the O_EXCL and O_TRUNC values in the no-cgo build.

ncw commented 2 years ago

I got a log back from the user. It shows quite clearly that Setxattr was called to set the extended attribute $KERNEL.PURGE.SEC.FILEHASH as part of the creating the file and that rclone returned ENOSYS from it.

So it looks like your theory is correct.

rclone[TID=3750]: FFFFD389126F0DC0: >>Create [UT---C] "\test\testfile", FILE_CREATE, CreateOptions=44, FileAttributes=20, Security=NULL, AllocationSize=0:1825000, AccessToken=0000000000000690[PID=19a8], DesiredAccess=17019f, GrantedAccess=0, ShareAccess=0
2022/04/21 12:18:59 DEBUG : /test: Getattr: fh=0xFFFFFFFFFFFFFFFF
2022/04/21 12:18:59 DEBUG : /test: >Getattr: errc=0
2022/04/21 12:18:59 DEBUG : /test/testfile: CreateEx: flags=0x502, mode=0770
2022/04/21 12:18:59 DEBUG : test/testfile: Open: flags=O_RDWR|O_CREATE|O_EXCL
2022/04/21 12:18:59 DEBUG : test/testfile: newRWFileHandle:
2022/04/21 12:18:59 DEBUG : test/testfile(0xc00005d0c0): openPending:
2022/04/21 12:18:59 DEBUG : test/testfile: vfs cache: truncate to size=0
2022/04/21 12:18:59 DEBUG : test: Added virtual directory entry vAddFile: "testfile"
2022/04/21 12:18:59 DEBUG : test/testfile(0xc00005d0c0): >openPending: err=<nil>
2022/04/21 12:18:59 DEBUG : test/testfile: >newRWFileHandle: err=<nil>
2022/04/21 12:18:59 DEBUG : test: Added virtual directory entry vAddFile: "testfile"
2022/04/21 12:18:59 DEBUG : test/testfile: >Open: fd=test/testfile (rw), err=<nil>
2022/04/21 12:18:59 DEBUG : /test/testfile: >CreateEx: errc=0, fh=0x0
2022/04/21 12:18:59 DEBUG : /test/testfile: Setxattr: name="$KERNEL.PURGE.SEC.FILEHASH", value="\x037\xedY\xa6f\x99\xc0\xe5\xa7\xeb\xee\xf75-|\x1c.\xd5\xed\xe7!)P\xa1\xb0\xa8\xee(\x9a\xf4\xa9[wA\xa5\xca\xd28\xce>L\xa7u`X\xf2\xa6zW\xfe\xe9\xd1J\x85\xbf\xd4O\t\xefFg\x9f\xaf\xcb\x1b\xabbz\xfb\xea\x9a_\xabZ\x97k\xb7\xa7鯟ݹ^\xfe\x97f_\xee\x99u\xadW\xfa{\xba\x99v}\xadw\xfd\xeb]vYj\u05f7\xad\xd9\xd5\xd9mw\xdd\xf9oZ\xb6n\xafo\xa7\xe5\xaf_潪zw\x00\x00\x00\x11)\xd6ա\x1dy\xb1W\xab\x81\x8a", flags=0
2022/04/21 12:18:59 DEBUG : /test/testfile: >Setxattr: errc=-40
2022/04/21 12:18:59 DEBUG : /test/testfile: Release: fh=0x0
2022/04/21 12:18:59 DEBUG : test/testfile(0xc00005d0c0): RWFileHandle.Release
2022/04/21 12:18:59 DEBUG : test/testfile(0xc00005d0c0): close:
2022/04/21 12:18:59 DEBUG : test/testfile: vfs cache: setting modification time to 2022-04-21 12:18:59.7972879 +0000 GMT m=+12.082671701
2022/04/21 12:18:59 INFO  : test/testfile: vfs cache: queuing for upload in 5s
2022/04/21 12:18:59 DEBUG : test/testfile(0xc00005d0c0): >close: err=<nil>
2022/04/21 12:18:59 DEBUG : /test/testfile: >Release: errc=0
rclone[TID=3750]: FFFFD389126F0DC0: <<Create IoStatus=c0000010[0]

Re-reading the code of Create I see that if the file system is passed extended attributes during file creation and setxattr is not implemented, then we are supposed to return STATUS_EAS_NOT_SUPPORTED.

https://github.com/winfsp/winfsp/blob/2fb2de067cee9eeacce55f4503f6c3afb3163a0f/src/dll/fuse/fuse_intf.c#L872-L880

This suggests that if the failure to set extended attributes is indeed the problem, we may not be able to ignore a -ENOSYS error and treat it as success. So unfortunately I do not have a good fix for this if it turns out to be the problem.

Here are some ideas...

  1. rclone could just return success for the xattr calls and null results. Probably not ideal and I'm not sure what Windows would do in this case if rclone lost the extended attributes it was supposed to keep.
  2. winfsp could translate the ENOSYS into a STATUS_EAS_NOT_SUPPORTEDat the point the *xattr is called. There don't appear to be too many of these. A variant of fsp_fuse_ntstatus_from_errno maybe fsp_fuse_ntstatus_from_xattr_errno?
  3. we could choose a new error code to mean STATUS_EAS_NOT_SUPPORTED, stick it in errno.i and get rclone to return it instead of ENOSYS.
  4. we could figure out a way to get NULLs into the fuse_operations structure for unsupported methods in cgofuse. For example we could make a call which takes a []string of unsupported fuse_operations and these could get zeroed out in the ops structure. So rather like the host.SetCapCaseInsensitive method, we could make a host.SetUnsupported(names []string) method.

What do you think? Any of those sound like a good idea?

billziss-gh commented 2 years ago

I got a log back from the user. It shows quite clearly that Setxattr was called to set the extended attribute $KERNEL.PURGE.SEC.FILEHASH as part of the creating the file and that rclone returned ENOSYS from it.

Thanks for confirming.

I find it rather unusual that anyone would try to set Extended Attributes (EAs in Windows parlance) during the Create call. I certainly have not observed it in the wild outside of specialized software or test apps. I note that this feature is not exposed to applications via the regular Win32 API. The EA name of $KERNEL.PURGE.SEC.FILEHASH seems suggestive of something although I am unsure what.

The sensible fix here is to assume success when setxattr fails with -ENOSYS during Create. This is what happens for other calls such as chown and chflags. The reason that I said earlier that "I do not have a good fix for this" is because the FILE_NO_EA_KNOWLEDGE option to the NtCreateFile system call complicates handling of EAs (when this flag is set we are supposed to fail Create calls that do not understand EAs -- however this may not be important in practice).

An additional fix would be to avoid enabling EA support during initialization of the file system. Currently this is done with the following code:

    if (0 != f->ops.listxattr && 0 != f->ops.getxattr &&
        0 != f->ops.setxattr && 0 != f->ops.removexattr)
        f->VolumeParams.ExtendedAttributes = 1;

Obviously this does not work if the file system has empty definitions of these operations that return -ENOSYS, like cgofuse does. A potential workaround is to issue a getxattr call to the root directory (/) and treat an -ENOSYS response as an indicator that the file system does not support EAs.

I will look look into implementing either the first one or both of these fixes in WinFsp-FUSE, which should resolve this issue.

rclone could just return success for the xattr calls and null results. Probably not ideal and I'm not sure what Windows would do in this case if rclone lost the extended attributes it was supposed to keep.

FUSE file systems already "approximate" the Windows file system. For example, FUSE file systems do not support full ACLs, file attributes or reparse points. I think there are fewer Windows applications that care about EAs rather than other features that FUSE does not support.

we could figure out a way to get NULLs into the fuse_operations structure for unsupported methods in cgofuse. For example we could make a call which takes a []string of unsupported fuse_operations and these could get zeroed out in the ops structure. So rather like the host.SetCapCaseInsensitive method, we could make a host.SetUnsupported(names []string) method.

That is an interesting idea and worth keeping in mind if we fail to provide a simpler fix.