openzfsonwindows / ZFSin

OpenZFS on Windows port
https://openzfsonwindows.org
1.21k stars 68 forks source link

mmap with ubuntu fails #163

Closed lundman closed 4 years ago

lundman commented 5 years ago

When using lx Ubuntu, any mmap request on ZFS will fail, but works on NTFS. There is something we are missing.

Test code:

mmap-test.c

``` # cat test.c #include #include #include #include #include #include main() { int fd; struct stat stbf; fd = open("e/file.txt", O_RDWR); if (fd < 0) exit(1); int r; r = fstat(fd, &stbf); printf("return %d size %ld\n", r, stbf.st_size); r = mmap(0, stbf.st_size, PROT_READ, MAP_PRIVATE, fd, 0); printf("mmap said %d errno %d\n", r, errno); } Assuming E: is ZFS, and mounted on /mnt/e/ # echo aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa > e/file.txt # ls -l e/file.txt -rw-r--r-- 1 root root 36 Aug 28 23:02 e/file.txt # ./mmap-test return 0 size 36 mmap said -1 errno 22 ```

Each time mmap-test is run, relatively little ZFS activity is noticed:

debug output

``` FFFF92849F586080: IRP_MJ_CREATE: FileObject FFFF9284A1819D90 related FFFF9284A6987A20 name 'file.txt' flags 0x1 sharing 0x7 options @FILE_OPEN attr 0x0 DesAcc0x20080 FFFF92849F586080: zfs_vnop_lookup: OK with FILE_OPENED FFFF92849F586080: dispatcher: enter: major 5: minor 0: IRP_MJ_QUERY_INFORMATION: type 0x6 FFFF92849F586080: file_stat_lx_information FFFF92849F586080: zfs_getwinflags: changing zfs 0x40800000004 to win 0x00000020 FFFF92849F586080: dispatcher: enter: major 5: minor 0: IRP_MJ_QUERY_INFORMATION: type 0x6 FFFF92849F586080: * file_name_information: (normalize 0) FFFF92849F586080: * file_name_information: name of '\file.txt' struct size 0x8 and FileNameLength 0x12 Usedspace 0x12 FFFF92849F586080: IRP_MJ_CREATE: FileObject FFFF9284A183DB00 related FFFF9284A1819D90 name '(null)' flags 0x1 sharing 0x7 options NonDirectoryFile @FILE_OPEN attr 0x0 DesAcc 0x20183 FFFF92849F586080: relative file open, increment "\file.txt" FFFF92849F586080: file_stat_information FFFF92849F586080: zfs_getwinflags: changing zfs 0x40800000004 to win 0x00000020 FFFF92849F586080: zfs_vnop_lookup: OK with FILE_OPENED FFFF92849F586080: dispatcher: enter: major 5: minor 0: IRP_MJ_QUERY_INFORMATION: type 0x6 FFFF92849F586080: file_stat_lx_information FFFF92849F586080: zfs_getwinflags: changing zfs 0x40800000004 to win 0x00000020 FFFF92849F586080: dispatcher: enter: major 5: minor 0: IRP_MJ_QUERY_INFORMATION: type 0x6 FFFF92849F586080: file_standard_information FFFF92849F586080: Returning size 36 and allocsize 4096 ```

Followed by CLEANUP, and CLOSE. Return failure.

Specifically, it:

As ZFS returns no failure, and hopefully all valid information in the 4 calls it makes, perhaps there is an external decision made to fail on ZFS. I have verified FsAttributes are returned similar to NTFS.

Unlikely that the POSIX mmap to Windows glue is available as source.

Cause is still unknown.

tamlin-mike commented 5 years ago

Ideas of a few things to test: Provided MapViewOfFile & co. from plain Win32 works as expected; try with different open/mapping flags for the various HANDLES.

Provided that works, try with duplicated HANDLE's (f.ex. like you'd do if you were to give a child process a new stdin/out/err).

Compare and verify ACL's, perhaps especially inherited.

lundman commented 5 years ago

I use sysinternals FileTest and can MapViewOfFile the same on ZFS as NTFS. Not discovered any differences between the two there.

ACLs on the other hand, I don't fully grasp, so I've mostly guessed at it. It looks right in CMDbox, I think :)

tamlin-mike commented 5 years ago

I stumbled across something that perhaps could provide a part of the puzzle. It's a rather long (and not entirely uninteresting) read, but it's possible to skim it to see there could be some tiny difference in flags in the IRP.

The section discussing the sandbox check was what actually got me thinking it could be related to this issue. Possibly the most relevant part to look for, in this context, could be in and around the part about RequestorMode.

https://googleprojectzero.blogspot.com/2019/03/windows-kernel-logic-bug-class-access.html

lundman commented 5 years ago

Quite an interesting and dense read. I'm not entirely sure I'm clear on the case even now, but for us, we are simply the recipient of a call - ie the driver at the very end, and we fail nothing coming in (unless we are supposed to fail something).

tamlin-mike commented 5 years ago

I was thinking it could provide a clue about debug-printing, to see if f.ex. the mentioned sandbox flag is used when the request is coming from WSL, and the driver perhaps needs to fiddle with some extra stuff on/in the IoStack.

Another idea... It could be possible to use a kernel debugger and do manual dumping of the IRP when it comes to NTFS, where the mmap seems to work.

Yet another idea. If everything else fails to spot a difference... What if plugging in a tiny FSFilter above NTFS, that takes a snapshot (of select suspects) of the IRP + stack on entry into ntfs.sys, diff it when it's coming back, and dump any differences?

I know, terribly vague, but with such a vague failure condition perhaps it's time to just do some brainstorming and see what sticks.

lundman commented 5 years ago

The tool FileSpy from internals is a minifilter that captures all IRPs, and I've stared myself blind comparing the log of NTFS and ZFS. They are essentially identical, until it just stops calling ZFS.

But I don't think FileSpy know about the sandbox flag, so I could at least check. From memory, I don't recall seeing anything weird, but then, I didn't know about ECP either, so....

lundman commented 5 years ago

Hmm I did notice something curious:

shell as root/administrator:

$ git clone https://github.com/openzfsonwindows/ZFSin.git
Cloning into 'ZFSin'...
error: could not write config file E:/ZFSin/.git/config: Invalid argument

but as user, it works fine. In Lx I have to use root, and it fails the same way. Perhaps there is something do the ACLs.

tamlin-mike commented 5 years ago

You could test mounting a VHD with NTFS on a dir, and a VHD with ZFS on another, then run icacls.exe on the dir's to see if there's a difference.

Another thing to check could be, when ZFS is mounted as a driveletter, run icacls on both X: and X:\ - they can refer to different levels (PDO vs FDO I think), and as such display different results.

Since I believe the ACL's at this level can be modified by admin using icacls, it should be really quick turnaround time to test different settings.

EDIT: Example of how it can look on a system with NTFS in a "native-boot" VHD, mounted as C:

>icacls C:
C: NT AUTHORITY\SYSTEM:(OI)(CI)(F)
   BUILTIN\Administrators:(OI)(CI)(F)
   MyComputername\MyUsername:(OI)(CI)(F)

>icacls C:\
C:\ BUILTIN\Administrators:(OI)(CI)(F)
    NT AUTHORITY\SYSTEM:(OI)(CI)(F)
    BUILTIN\Users:(OI)(CI)(RX)
    NT AUTHORITY\Authenticated Users:(OI)(CI)(IO)(M)
    NT AUTHORITY\Authenticated Users:(AD)
    Mandatory Label\High Mandatory Level:(OI)(NP)(IO)(NW)

Example of a VHD with NTFS mounted as both driveletter and dir

icacls D:
D: BUILTIN\Administrators:(F)
   BUILTIN\Administrators:(OI)(CI)(IO)(F)
   NT AUTHORITY\SYSTEM:(F)
   NT AUTHORITY\SYSTEM:(OI)(CI)(IO)(F)
   NT AUTHORITY\Authenticated Users:(M)
   NT AUTHORITY\Authenticated Users:(OI)(CI)(IO)(M)
   BUILTIN\Users:(RX)
   BUILTIN\Users:(OI)(CI)(IO)(GR,GE)

>icacls D:\
exactly the same as D:

>icacls dir_mountpoint
same as D:, only each entry prepended by (I)
tamlin-mike commented 5 years ago

error: could not write config file E:/ZFSin/.git/config: Invalid argument

That's an unexpected error. Access denied I could have accepted, but invalid argument? When you repro this, do that IRP even reach ZFS? If it does, check return.

lundman commented 5 years ago

You might be right - something is up with permissions. In Windows/Finder, as a user, I can use everything normally, including git clone. But can't git clone as root.

Turns out that in lx-ubuntu, I do all my testing as root. If I try as a user, I do not get far.

Show root can, user can't

``` $ sudo mount -t drvfs E: /mnt/e $ mount C:\ on /mnt/c type drvfs (rw,noatime,uid=1000,gid=1000,case=off) E: on /mnt/e type drvfs (rw,relatime,case=off) $ ls -l e/ total 0 drwxr-xr-x 2 nobody root 512 Oct 24 17:08 'System Volume Information' drwxr-xr-x 2 root root 512 Oct 24 17:09 dir $ cd e/dir $ id uid=1000(lundman) gid=1000(lundman) $ sudo chown lundman . $ sudo chmod 777 . $ ls -la total 0 drwxr-xr-x 2 root root 512 Oct 24 17:18 . drwxr-xr-x 4 root root 512 Oct 24 17:09 .. $ mkdir g mkdir: cannot create directory ‘g’: Permission denied $ sudo mkdir g $ ls -la total 0 drwxr-xr-x 3 root root 512 Oct 24 17:19 . drwxr-xr-x 4 root root 512 Oct 24 17:09 .. drwxr-xr-x 2 root root 512 Oct 24 17:19 g ```

icacls E:

``` C:\Windows\system32>icacls E: E: BUILTIN\Administrators:(F) BUILTIN\Administrators:(OI)(CI)(IO)(F) NT AUTHORITY\SYSTEM:(F) NT AUTHORITY\SYSTEM:(OI)(CI)(IO)(F) NT AUTHORITY\Authenticated Users:(M) NT AUTHORITY\Authenticated Users:(OI)(CI)(IO)(M) BUILTIN\Users:(RX) BUILTIN\Users:(OI)(CI)(IO)(GR,GE) Successfully processed 1 files; Failed processing 0 files ```

icacls E:\

``` C:\Windows\system32>icacls E:\ E:\ BUILTIN\Administrators:(F) BUILTIN\Administrators:(OI)(CI)(IO)(F) NT AUTHORITY\SYSTEM:(F) NT AUTHORITY\SYSTEM:(OI)(CI)(IO)(F) NT AUTHORITY\Authenticated Users:(M) NT AUTHORITY\Authenticated Users:(OI)(CI)(IO)(M) BUILTIN\Users:(RX) BUILTIN\Users:(OI)(CI)(IO)(GR,GE) Successfully processed 1 files; Failed processing 0 files ```

And no, it isn't a failure that ZFS returns when it goes wrong. It probes ZFS a few times, but something "above" ZFS decides it's wrong and returns failure.

they can refer to different levels (PDO vs FDO I think)

I have one of those, didn't know there was a second. All I do is attach a security descriptor, one for root level: https://github.com/openzfsonwindows/ZFSin/blob/master/ZFSin/zfs/module/zfs/zfs_vnops_windows_lib.c#L1491

Which uses the default: https://github.com/openzfsonwindows/ZFSin/blob/master/ZFSin/zfs/module/zfs/zfs_vnops_windows_lib.c#L75

and inherit for each subdirectory: https://github.com/openzfsonwindows/ZFSin/blob/master/ZFSin/zfs/module/zfs/zfs_vnops_windows_lib.c#L1539

I have no knowledge of PDO :)

lundman commented 5 years ago

This was also somewhat amusing:

C:\Windows\system32>icacls E:\ /t /grant Everyone:(OI)(CI)F
processed file: E:\
processed file: E:\dir
processed file: E:\System Volume Information
processed file: E:\dir\g
processed file: E:\System Volume Information\WPSettings.dat
Successfully processed 5 files; Failed processing 0 files

C:\Windows\system32>takeown /r /d y /f E:\

SUCCESS: The file (or folder): "E:\" now owned by user "DESKTOP-B8UVFGL\WDKRemoteUser".

INFO: File ownership cannot be applied to file or folder "E:\dir"; insecure file systems (FAT32) do not support ACLs.
tamlin-mike commented 5 years ago

I have no knowledge of PDO :)

Don't worry about it. I think the problem may be way more shallow.

INFO: File ownership cannot be applied to file or folder "E:\dir"; insecure file systems (FAT32) do not support ACLs.

Now that is a clue if I ever saw one!

This got me curious... https://github.com/openzfsonwindows/ZFSin/blob/530fbb8565803c714f3232cd01b8bad8631dd8c4/ZFSin/zfs/module/zfs/zfs_vnops_windows_lib.c#L1546

Provided my skimming of the code is not totally wrong, that suggest it can't ever change an existing ACL on anything but root dir, and could account for the takeown error.

lundman commented 5 years ago

Provided my skimming of the code is not totally wrong, that suggest it can't ever change an existing ACL on anything but root dir, and could account for the takeown error.

That does look suspect, so I took it out and ran again - but didn't actually make any difference. In fact, a breakpoint in set_security() is never called with takeown.

tamlin-mike commented 4 years ago

a breakpoint in set_security() is never called with takeown

So no IRP_MJ_SET_SECURITY IRP is ever received then? /me goes deer in headlights

Still, I do believe the course this has taken points in a rather clear direction. Now it's "only" a matter of finding out where it takes a wrong turn. That, and add (unit) tests for it, once fixed! :-)

Taking a step back to

error: could not write config file E:/ZFSin/.git/config: Invalid argument

I did a grep for STATUS_INVALID_PARAMETER, and while there are a bunch of them, I believe it would be within the realms of possible to set breakpoints on them all and re-run that test, to see if it's indeed coming from ZFS.

I think setting the breakpoints might even be possible to do more-or-less automatically, by searching from within VS (find in files), copy the file list + line numbers (possibly slightly mangled), prepend bp to each line, and finally paste (to windbg?).

Just to get absolute certainty about whether or not that specific problem is indeed what's suspected, to narrow it down and hopefully help getting to the bottom of this.

lundman commented 4 years ago

So no IRP_MJ_SET_SECURITY IRP is ever received then?

It appeared to not be called, but I had 3 minutes to run a quick test before home time yesterday, so I will double check.

When it comes to mmap/ubuntu i know it is not ZFS that returns any error, all dispatches go through one handler, so it is easy to see what I return. The filespy logs only show that ZFS doesn't show errors. But presumably something else is wrong.

That it says FAT32 made me think maybe I missed a filesystem-capability flag, like FILE_PERSISTENT_ACLS - but I do set that, and last I looked at it, nearly all that NTFS sets.

Those I don't set are: FILE_CASE_SENSITIVE_SEARCH | FILE_FILE_COMPRESSION | FILE_SUPPORTS_ENCRYPTION | FILE_SUPPORTS_TRANSACTIONS | FILE_SUPPORTS_USN_JOURNAL There is also FILE_RETURNS_CLEANUP_RESULT_INFO | FILE_SUPPORTS_POSIX_UNLINK_RENAME which I have experimented with supporting, as they sounded interesting. But I could never figure out what FILE_RETURNS_CLEANUP_RESULT_INFO is supposed to do.

tamlin-mike commented 4 years ago

That it says FAT32 made me think maybe I missed a filesystem-capability flag, like FILE_PERSISTENT_ACLS - but I do set that, and last I looked at it, nearly all that NTFS sets.

Alternative hypothesis: System tried to set ACL on subdir, failed (or less likely, succeeded, but then tried to read the ACL back out to verify and didn't get what was expected), and used a bad fallback, claiming "anything not supporting ACL's must be FAT32". :-) (unless hard-coded to check for different FAT versions, I suspect you'd get the same error for exFAT, FAT-16 or FAT-12 too)

As for the rest of the capabilities flags, I don't see them missing having anything to do with current issue. Suffice it to say FILE_FILE_COMPRESSION should be fairly easy to add, once the rest has stabilized.

lundman commented 4 years ago

claiming "anything not supporting ACL's must be FAT32". :-)

Yes, the message itself is perhaps not so important, but it definitely fails on something trivial that should work. So there is something there to fix.

FILE_FILE_COMPRESSION wouldn't be so sure, since we can't store any "windows specific" compressed files (or all platforms have to support it) and ZFS doesn't compress individual items, but rather, compression is a layer before writing out blocks.

tamlin-mike commented 4 years ago

but it definitely fails on something trivial that should work

Yes, and to me it seems to point towards not being able to set ACL on subdirectory, or if that succeeds it fails to get the same expected ACL when checking. Possibly limited to subdir that already had another ACL, subdir in root, or combinations of the two. Could even be something more esoteric, like if the entry got created with a "template" entry (a rather esoteric and AFAIK infrequently used feature of the Win32 API CreateFile).

and ZFS doesn't compress individual items

Oh, I had gotten the impression that while it was enabled by pool/vol, files could be individually and selectively compressed. If the little I've read about it had mislead me, meaning compression granularity isn't per-file but per-vol (or even pool?) that flags is obviously out of the picture.

But to my mind, there is no "windows specific" compressed files to handle. If the FS can do compression, it's entirely free to handle it however it likes, provided it can handle file I/O as-if the file wasn't compressed.

--- Completely off-topic below --- Here's a possibly useless, and definitely off-topic, history of NTFS, Windows and compression:

Since NT 3.51 NTFS have native support for compressed files. It's basically LZ on 64KB chunks (16 clusters on default 4K cluster size) it calls "compression units" (IIRC). When you tell NTFS to compress a file, it does so in 64KB chunks (and fragments the hell out of the file in the process :-) ). File access is completely transparent, handled entirely within the NTFS driver, allowing normal random access to the file, reading and writing. The only externally visible indication the file is compressed is the Compressed file attribute flag, that is both a control bit (sets compressed state) and an indicator.

With Windows 10 (or maybe it started with 8?) another layer was added. An FS filter called "Wof" (Windows Overlay Filter) you get invoked if you use compact /EXE:.... That plugs itself in as a reparse point handler, replacing the normal unnamed $DATA stream with a (by filter compressed) $DATA stream named "WofCompressedData". From NTFS's and Win32's POV such files are not compressed. The compressed attribute isn't set on the file, and the only thing NTFS can do with the file data is handing over the raw data to wof.sys for decompression.

lundman commented 4 years ago

Ah thanks for the insight into compression. Certainly one could reply compressed or not to a query, but generally the compression setting is for the dataset, and everything(-ish) inside is compressed.

If the implementation details are up to the FS, then one could do whatever one wants, although the Wof layer sounds more like I was thinking - as I'm used to Apple HFS's decmpfs. Which compresses to a xattr, then truncates real file. FS then transparently handles the translation in read/write/lseek. This we can't do, since then all platforms would need to have decmpfs support.

But in ZFS, we can probably say "OK" to a request to compress, and do nothing - future problem to look at :)

lundman commented 4 years ago

Right so, confirming takeown run, it is interesting to note that it calls

set_security("\", 0xfffe);

So, for "E:\" it does call set_security() and I stepped through the code, and ZFS appears to go through the motions, setting the z_uid = 0xfffe and writing out to disk.

The set_information() handler is quite simple and does not call zfs_set_security(). But it appears to do whats needed. Lookup old SD, try to set new SD, and free old-SD if that works. Then call ZFS to store the uid change to disk. It appears not to do any checks it worked.

Then it starts the recursion, the first element is System Volume Information which I'll skip - it is a bit special case, and I don't think it should be shown at all, its supposed to be HIDDEN I think.

So it eventually gets to hello directory, it opens it, queries file_standard_link_information (returning 2, 2, false, true) then FileNormalizedNameInformation, returning \hello then file_name_information, returning \hello

Then it tries that 3 more times, then prints the failure on console.

To be honest, I am not sure what normalized actually mean when it comes to the names query calls.

lundman commented 4 years ago

You will only get names like 'storage stack device name'[]'path from the root of the owning file system' back from the filter manager for either opened or normalized names. 'storage stack device name' will be something like \Device\HarddiskVolume1 or \Device\LanmanRedirector.

Oh is that what it means by normalized. Why is it so hard to find examples for the difference of what those two return. Wen I use FileTest it returns identical strings \src\TestFile.bin for both, so there does not appear to be any difference.

tamlin-mike commented 4 years ago

z_uid = 0xfffe

Is (short)-2 a special uid mapping, for System or something like that? What would that be in errno? (just thinking out loud)

System Volume Information ... its supposed to be HIDDEN I think.

Yes. Win32 FS flags SH (System and Hidden). It's an NTFS-specific directory, that mostly NTFS itself and its filters use. Contains f.ex. volume snapshots, dedup database (in case it's installed) and other stuff the OS itself should be completely oblivious about. In fact, the only ACL entry on it is usually System Volume Information NT AUTHORITY\SYSTEM:(OI)(CI)(F)

FileNormalizedNameInformation, returning \hello

Normalized in this context seems to mean "absolute path from root directory".

https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-fscc/20bcadba-808c-4880-b757-4af93e41edf6

Example: You have a volume vol mounted at X:\mnt\vol, the normalized name for X:\mnt\vol\dir\file would be \dir\file, and it would be handled by vol FS driver.

So it eventually gets to hello directory, it opens it, queries file_standard_link_information (returning 2, 2, false, true)

Is 2 the link counts (accessible, total)? If so, why are there two links to that directory?

Another thing that just struck me, where I have seen others make the same mistake, is that they check only the required size of a buffer, missing that sometimes it must be exactly what the OS states, else driver should fail.

Just to be 100% certain that's not the problem, I'd add that check after

https://github.com/openzfsonwindows/ZFSin/blob/0766fda9a8875595c492e64f64273862462672e2/ZFSin/zfs/module/zfs/zfs_vnops_windows_lib.c#L2568-L2571

i.e. if (IrpSp->Parameters.QueryFile.Length != sizeof(FILE_STANDARD_LINK_INFORMATION)) { return STATUS_INFO_LENGTH_MISMATCH; }, together with an appropriate amount of cursing and dprintf. :-)

-- bonus reading -- Some more food for thought, about performance and... stuff (users could install 3rd party antivirus fsfilters) https://community.osr.com/discussion/226052/normalized-name-in-win8

lundman commented 4 years ago

Is (short)-2 a special uid mapping, for System or something like that? What would that be in errno?

I think it was what RtlGetOwnerSecurityDescriptor() returned, I didn't look too hard, as I was looking for errors.

Normalized in this context seems to mean "absolute path from root directory".

One doc said I should return \Device\Point\src\file.bin - but are you saying that the FS don't need to bother with it? Then why are there two IRPs for the same thing?

Is 2 the link counts (accessible, total)? If so, why are there two links to that directory?

Standard Unix, "." and ".." - should it be 0 in windows?

STATUS_INFO_LENGTH_MISMATCH

I did not know that - I will amend. How many are there that check for exact size?

tamlin-mike commented 4 years ago

One doc said I should return \Device\Point\src\file.bin

That sounds odd. If you could point me to that doc I can have a look.

But to put it in context: you (when you have your FS hat on) can be mounted on any number of places. You don't know anything about \\.\PhysicalHarddisk4 or \\.\LogicalVolume42 or S:4 (if you are the fourth VHD mounted in the system). Yes, ofc ZFS knows a lot more about disks than normal filesystems, but look at it from the FS' view. The only canonical path to an entry in/on a filesystem is from the root of the FS. Logic then dictates this is the only canonical path you can provide - the volume-relative absolute path to the entry.

but are you saying that the FS don't need to bother with it?

No. I am saying it makes no sense (to me) to even attempt to. It was quite some time since I was digging deep in NT's object manager and NT filesystems, so I could be wrong.

Then why are there two IRPs for the same thing?

What IRP's are you referring to?

Standard Unix, "." and ".." - should it be 0 in windows?

One (1). Neither "." nor ".." actually exist in Windows, they are implicit. In Unix they are explicit. A new created entry, be it a dir or file have a link count of one in windows. To the best of my knowledge, only by creating a hardlink to an entry does the link count increase.

Could it be that simple; wrong link count? Naaah...

How many are there that check for exact size?

Unknown. I know I've encountered it myself while implementing some stuff in the kernel (ReactOS - possibly a very good thing to test ZFSin in, since it's GPL) where some structs were checked exactly that way by the kernel, and failed if the buffer wasn't exactly the right size. You couldn't sneak in an extra bit before it returned error, much less an extra byte. :-)

lundman commented 4 years ago

On the whole normalized thing, there is IRP IRP_MJ_QUERY_INFORMATION which specifies FileInformationClass (among others) classes FileNameInformation and FileNormalizedNameInformation.

Both FileNameInformation and FileNormalizedNameInformation uses the same struct FILE_NAME_INFORMATION, so only the filename inside it will differ, ie, normalized or not. But I am not sure what they mean by normalized. I return identical strings for both.

This documentation (albeit from userland POV): https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/fltkernel/ns-fltkernel-_flt_file_name_information has the description:

A file name is considered normalized if all of the following are true: It contains the full directory path for the file, including the volume name

and likewise, ntfs forum post had:

https://community.osr.com/discussion/comment/73841/#Comment_73841

But, I have just tried to make normalized to return strings like: \Devices\ZFS{xxxx-xxxx-xx-xx-xxx}\hello and it made S-F-A difference. :)

I tweaked link count as well, but no difference.

Checked returning STATUS_INFO_LENGTH_MISMATCH but no dice, and fastfat doesn't contain that returncode at all, so presumably things can't rely on it.

lundman commented 4 years ago

The smallest takeown test I can do is:

2>takeown /F E:\hehe
ERROR: File ownership cannot be applied on insecure file systems;
       there is no support for ACLs.
1   19:32:49.617        takeown.exe IRP_MJ_CREATE   00000884    0000000000000000    \hehe   STATUS_FILE_IS_A_DIRECTORY  FILE_OPEN CreOpts: 00000060 Access: 00100080 Share:  00000003 Attrib: 0
2   19:32:49.617        takeown.exe IRP_MJ_CREATE   00000884    FFFFB706CDE808D8    \hehe   STATUS_SUCCESS  FILE_OPEN CreOpts: 00220020 Access: 00100080 Share:  00000003 Attrib: 00000080 Result: FILE_OPENED
3   19:32:49.617        takeown.exe IRP_MJ_FILE_SYSTEM_CONTROL/IRP_MN_USER_FS_REQUEST   00060870    FFFFB706CDE808D8    \hehe   STATUS_NOT_A_REPARSE_POINT  FSCTL_GET_REPARSE_POINT (000900A8)
4   19:32:49.617        takeown.exe IRP_MJ_CLEANUP  00000404    FFFFB706CDE808D8    \hehe   STATUS_SUCCESS  
5   19:32:49.617        takeown.exe IRP_MJ_CLOSE    00000404    FFFFB706CDE808D8    \hehe   STATUS_SUCCESS  
6   19:32:49.617        takeown.exe IRP_MJ_CREATE   00000884    FFFFB706CDE80618    \hehe   STATUS_SUCCESS  FILE_OPEN CreOpts: 00000020 Access: 00100080 Share:  00000003 Attrib: 0 Result: FILE_OPENED
7   19:32:49.617        takeown.exe IRP_MJ_CLEANUP  00000404    FFFFB706CDE80618    \hehe   STATUS_SUCCESS  
8   19:32:49.617        System  IRP_MJ_CLOSE    00000404    FFFFB706CDE80618    \hehe   STATUS_SUCCESS  
9   19:32:49.617        takeown.exe IRP_MJ_CREATE   00000884    FFFFB706CDE80618    \hehe   STATUS_SUCCESS  FILE_OPEN CreOpts: 00000010 Access: 00100080 Share:  00000003 Attrib: 0 Result: FILE_OPENED
10  19:32:49.617        takeown.exe FASTIO_DEVICE_CONTROL       FFFFB706CDE80618    \hehe   FAILURE IOCTL_MOUNTDEV_QUERY_DEVICE_NAME (004D0008)
11  19:32:49.617        takeown.exe IRP_MJ_DEVICE_CONTROL   00060070    FFFFB706CDE80618    \hehe   STATUS_SUCCESS  IOCTL_MOUNTDEV_QUERY_DEVICE_NAME (004D0008)
12  19:32:49.617        takeown.exe IRP_MJ_CLEANUP  00000404    FFFFB706CDE80618    \hehe   STATUS_SUCCESS  
13  19:32:49.617        takeown.exe IRP_MJ_CLOSE    00000404    FFFFB706CDE80618    \hehe   STATUS_SUCCESS  
14  19:32:49.617        takeown.exe IRP_MJ_CREATE   00000884    FFFFB706CDB61608    \hehe   STATUS_SUCCESS  FILE_OPEN CreOpts: 00000021 Access: 00100000 Share:  00000003 Attrib: 0 Result: FILE_OPENED
15  19:32:49.617        takeown.exe IRP_MJ_QUERY_INFORMATION    00060870    FFFFB706CDB61608    \hehe   STATUS_SUCCESS  FileNameInformation Name: \hehe
16  19:32:49.617        takeown.exe IRP_MJ_CLEANUP  00000404    FFFFB706CDB61608    \hehe   STATUS_SUCCESS  
17  19:32:49.617        takeown.exe IRP_MJ_CLOSE    00000404    FFFFB706CDB61608    \hehe   STATUS_SUCCESS  

Which is practically nothing, and yet, somehow, decides we aren't cool.

lundman commented 4 years ago

Having a go at x64dbg on takeown.exe - and I'm guessing here - but it appears to fail due to ERROR_DIR_NOT_ROOT when calling GetVolumeInformationW() which in turn calls IsThisARootDirectory(). Appears to check name ends with "\" so perhaps it is supposed to fail here - I am running it with "E:\hello" after all.
I don't think I should ever return "hello\" anywhere as a filesystem?

lundman commented 4 years ago

and inside IsThisARootDirectory() we appear to pass the first check of "\" then, it calls ZwOpenSymbolicLinkObject() which fails with STATUS_OBJECT_TYPE_MISMATCH

This makes me think that whereever(!) I return:

FFFFB706C5251080: * file_name_information:  name of '\Volume{413cee64-9f5b-3f10-
848f-b387fdedfcc9}\' struct size 0x8 and FileNameLength 0x5c Usedspace 0x5c

must in fact be a symlink.

lundman commented 4 years ago

I have changed it to use the symlink name; the call to ZwOpenSymbolicLinkObject() still fails with STATUS_OBJECT_TYPE_MISMATCH.

Events seen from ZFS are:

FFFFB706C5FC7080: IOCTL_MOUNTDEV_QUERY_DEVICE_NAME
replying with '\DosDevices\Global\Volume{413cee64-9f5b-3f10-848f-b387fdedfcc9}'

 IRP_MJ_CREATE: FileObject FFFFB706D6343370 name '\' 
* file_name_information:  name of '\' 
* query_volume_information: FileFsVolumeInformation
* query_volume_information: FileFsAttributeInformation
* query_volume_information: FileFsFullSizeInformation

and after that, ZwOpenSymbolicLinkObject() returns failure.

lundman commented 4 years ago

Ok, so with takeown

            // If it is a DIR, make sure it ends with "\", except for
            // root, that is just "\"
            if (S_ISDIR(zp->z_mode))
                strlcat(strname, "\\",
                    MAXPATHLEN);

Ie, when calling IRP_MJ_QUERY_INFORMATION(file_name_information) I now add a "\" character at the end of the name if it is a dir;

\hello => \hello\

This makes takeown work:

C:\Windows\system32>takeown /f E:\hello

SUCCESS: The file (or folder): "E:\hello" now owned by user "DESKTOP-B8UVFGL\WDKRemoteUser".

Guess I missed the documentation detailing that requirement. What else expects the same? query directory information?

Alas, it does not fix the ubuntu "user ownership" problem, nor the trashcan.

lundman commented 4 years ago

I can also report that lx/ubuntu definitely uses the mode field in file_stat_lx_information, and its bits are quite difference from ZFS/posix. If I return 0700, no "user" can do anything, as we've experienced until now. But return 0777 and all is well. I have changed the zfs_create_fs() and zfs_mkdir() default to be 0777 in the Windows build.

The permission bits found to be:

    if (S_ISDIR(z)) w |= 0x4000; // _S_IFDIR
    if (S_ISREG(z)) w |= 0x8000; // _S_IFREG
    if (S_ISCHR(z)) w |= 0x2000; // _S_IFCHR
    if (S_ISFIFO(z)) w |= 0x1000; // _S_IFIFO
    if ((z&S_IRUSR) == S_IRUSR) w |= 0x0100; // _S_IREAD
    if ((z&S_IWUSR) == S_IWUSR) w |= 0x0080; // _S_IWRITE
    if ((z&S_IXUSR) == S_IXUSR) w |= 0x0040; // _S_IEXEC
    // Couldn't find documentation for the following, but
    // tested in lx/ubuntu to be correct.
    if ((z&S_IRGRP) == S_IRGRP) w |= 0x0020; // 
    if ((z&S_IWGRP) == S_IWGRP) w |= 0x0010; // 
    if ((z&S_IXGRP) == S_IXGRP) w |= 0x0008; // 
    if ((z&S_IROTH) == S_IROTH) w |= 0x0004; // 
    if ((z&S_IWOTH) == S_IWOTH) w |= 0x0002; // 
    if ((z&S_IXOTH) == S_IXOTH) w |= 0x0001; // 
lundman commented 4 years ago

OK, to sum up. The takeown problem was fixed by adding backslash to directories. Not sure if this is really required, and should that also be done for directory listings.

The ubuntu non-root-users can't do anything problem, was fixed by returning the correct lx_mode bits, and defaulting to 0777.

The mmap problem persist, but I have at least removed the issues surrounding it.

I believe ubuntu's mmap fails when calling ZwCreateSection() with error 0xc0000020 (STATUS_INVALID_FILE_FOR_SECTION) even though Attribute used is OBJ_KERNEL_HANDLE SECTION_ALL_ACCESS. Perhaps I should find a small sample program that calls it and see if I can replicate the issue.

lundman commented 4 years ago

ReactOS sources show that NtCreateSection() can return STATUS_INVALID_FILE_FOR_SECTION when

if (FileObject->SectionObjectPointer == NULL ||

So perhaps assigning SectionObjectPointer when we need it is not the way to go, I changed it to always assign SectionObjectPointer when assigning vp to FileObject.

This results in a different error:

$ git clone https://github.com/openzfsonwindows/ZFSin.git
Cloning into 'ZFSin'...
error: chmod on /mnt/e/ZFSin/.git/config.lock failed: Operation not permitted

It is also pleasing to note that /mnt/e is automatically mounted now, I no longer need to mount it by hand.

lundman commented 4 years ago

It is possible we actually pass mmap now, I'm not sure. But looking at the chmod problem.

It seems lx only calls file_stat_lx_information then decides chmod should fail. It does not appear to be uid/gid checks, as I can fudge those. It is possible it is EffectiveAccess as I'm not sure what to make of it. I have tried 0, GENERIC_ALL and SPECIFIC_RIGHTS_ALL | ACCESS_SYSTEM_SECURITY in increasing desperation, but nothing appears to affect that.

lundman commented 4 years ago

Ah nope, wrong - it is about uid/gid (on the parent directory). Hardcode everything to 1000 (my ubuntu uid) and we get tantalisingly close:

$ git clone https://github.com/openzfsonwindows/ZFSin.git
Cloning into 'ZFSin'...
remote: Enumerating objects: 16, done.
remote: Counting objects: 100% (16/16), done.
remote: Compressing objects: 100% (15/15), done.
remote: Total 15185 (delta 6), reused 3 (delta 1), pack-reused 15169
Receiving objects: 100% (15185/15185), 11.44 MiB | 3.90 MiB/s, done.
Resolving deltas: 100% (10953/10953), done.
fatal: bad object 56eb0995b3c9511f440542375a8bab7a27368583
fatal: remote did not send all necessary objects

So it does at least solve the mystery of mmap - now to fix the uid/gid ownership.

lundman commented 4 years ago

So removing static use of uid=1000, and try to assign the uid on file creation, I expect to get Irp->AssociatedIrp.SystemBuffer to point to EAs, but it is always NULL. Nothing calls set_ea() either, so that is surprising. Why is lx not trying to set the UIDs?

tamlin-mike commented 4 years ago

Great to see you got so many issues solved!

I'm still kinda surprised you had to create the whole path from NT's namespace root. I would have expected Ob to prepend the "device" path since that seems like the only logical thing to me. Live and learn.

FileObject->SectionObjectPointer

Isn't that a function of Mm, creating the section, pinning the file in Cc, and reading the page(s)? I would not expect a filesystem to populate it.

I expect to get Irp->AssociatedIrp.SystemBuffer to point to EA

I thought EA's (Extended Attributes) were only included in NT(FS) for OS/2 compatibility, and never used since. Have they become resurrected to provide lx permissions (on top of "native" Windows permissions)? As should be obvious, I haven't got a clue how Microsoft handles neither lx file ownership nor POSIX permissions.

I did stumble across the following post from MS. Could it provide a clue? Is what that post call "metadata" perhaps the EA's? If so, it seems to be host-version dependent. https://devblogs.microsoft.com/commandline/chmod-chown-wsl-improvements/

++luck;

lundman commented 4 years ago

metadata sounds interesting, will try that

tamlin-mike commented 4 years ago

Additionally, \Volume{413cee64-9f5b-3f10-848f-b387fdedfcc9}\ is indeed a symlink, in the NT's object Manager, probably created by the mount manager. The "absolute" version is something like `\PhysicalDriveX..." I believe.

EDIT: For a deeper look at the relationships between the NT names and what they are and/or points to, try WinObj.

The system creates many symlinks for volumes. An authentic example with a mapped H:

\Device\HarddiskVolume11 (type Device, not a symlink)

\Device\Harddisk4\DR4 (type Device, not a symlink)
\Device\Harddisk4\Partition0 -> \Device\Harddisk4\DR4
\Device\Harddisk4\Partition3 -> \Device\HarddiskVolume11

\GLOBAL??\H: -> \Device\HarddiskVolume11
\GLOBAL??\HarddiskVolume11 -> \Device\HarddiskVolume11
\GLOBAL??\Harddisk4Partition3 -> \Device\HarddiskVolume11
\GLOBAL??\PhysicalDrive4 -> \Device\Harddisk4\DR4
\GLOBAL??\Disk{some_guid} -> \Device\Harddisk4\DR4
\GLOBAL??\STORAGE#Volume#{some_guid}#64bit_hexval#{some_guid} -> \Device\HarddiskVolume11
\GLOBAL??\Volume#{some_guid} -> \Device\HarddiskVolume11

Then we have the ARC names (using the style from... Ultrix?) with yet
another indirection.

\ArcName\multi(0)disk(0)rdisk(3) -> \Device\Harddisk4\Partition0
\ArcName\multi(0)disk(0)rdisk(3)partition(3) -> \Device\Harddisk4\Partition3

Crystal clear, isn't it. :-)

lundman commented 4 years ago

I can finally confirm that adding -o metadata to mount arguments gets me EA in MJ_CREATE containing $LXUID. So, I'll copy over the support code for that.

lundman commented 4 years ago

http://www.lundman.net/OpenZFSOnWindows-debug-20191123.exe

That works better, uid/gid created, you can chmod etc. git clone gets tantalisingly close to working :)

ZFSin_ubuntu_uid

lundman commented 4 years ago

https://github.com/openzfsonwindows/ZFSin/commit/fe071ca52d724e6a6080b97ba0e281e8552c1260 Ok, I will close this ticket as the original mmap problem has been solved. There are further problems of course, but I will open a new ticket should it be required. Thanks for the assistance everyone.