haskell / win32

Haskell support for the Win32 API
http://hackage.haskell.org/package/Win32
Other
98 stars 62 forks source link

Add WindwowsString/WindowsFilePath support wrt AFPP #198

Closed hasufell closed 2 years ago

hasufell commented 2 years ago

Depends on: https://github.com/haskell/filepath/pull/103

For testing file operations, you can use https://github.com/hasufell/file-io (also please review the windows module)

Mistuke commented 2 years ago

Thanks for the PR! Can I ask you to split it into two commits? One for the refactoring for the extraction to the internal functions and one for the new functions. They can still be in the same PR, I can just more easily filter away the refactoring changes.

I'm away till Friday so will do a more indepth review then, on first glance it looks great though!

Mistuke commented 2 years ago

Also, WindowsString == CWString and WindowsFilePath ==ByteString if I'm reading this correctly?

Mistuke commented 2 years ago

Also for your create flags here https://github.com/hasufell/file-io/blob/master/System/File/AbstractFilePath/Windows.hs#L20 you'll want to add FILE_FLAG_OVERLAPPED for WinIO otherwise it'll force the handle to behave synchronously (and so the I/O manager will be forced sequentially while this handle is in use).

See https://github.com/ghc/ghc/blob/378c0bba7d132a89dd9c35374b7b4bb5a4730bf7/libraries/base/GHC/IO/Windows/Handle.hsc#L904

You can also conditionally add the flag depending on which I/O manager is in use, see https://gitlab.haskell.org/ghc/ghc/-/wikis/migration/9.0#io-manager-winio-related-changes

Mistuke commented 2 years ago

Also which format will fp be here? https://github.com/hasufell/file-io/blob/master/System/File/AbstractFilePath/Windows.hs#L15 I assume a device path? i.e.\\?\?

Mistuke commented 2 years ago

Also there's a small bug with WriteMode https://github.com/hasufell/file-io/blob/master/System/File/AbstractFilePath/Windows.hs#L27

WriteMode expects the pointer to be at the start of the file, so opening a file in write mode expects the file to be overwritten (where-as append mode appends to the end).

So you have to reset the file size after opening it https://github.com/ghc/ghc/blob/378c0bba7d132a89dd9c35374b7b4bb5a4730bf7/libraries/base/GHC/IO/Windows/Handle.hsc#L856 otherwise the file won't be truncated on write.

for AppendMode you also probably want GENERIC_WRITE mode so the user can move the pointer around.

hasufell commented 2 years ago

Also, WindowsString == CWString and WindowsFilePath ==ByteString if I'm reading this correctly?

newtype WindowsString = WS { unWFP :: BS.ShortByteString }

It's a ShortByteString (unpinned bytearray), which is expected to be in UTF-16 format. That's why there is a specialized module System.AbstractFilePath.Data.ByteString.Short.Word16 for operations such as:

unfoldr :: (a -> Maybe (Word16, a)) -> a -> ShortByteString

...so we don't do any conversion whatsoever.

WindowsFilePath is just a type synonym to WindowsString.

We then use the specialized functions:

useAsCWString :: ShortByteString -> (Ptr Word16 -> IO a) -> IO a

The endgoal then is for the user to use AbstractFilePath (as demonstrated in the file-io package), where:

#if defined(mingw32_HOST_OS) || defined(__MINGW32__)
type PlatformString = WindowsString
#else
type PlatformString = PosixString
#endif

newtype AbstractFilePath = AbstractFilePath PlatformString

which has this nice trick that you can't unpack AbstractFilePath to the wrong platform, because WindowsString and PosixString have different constructors. And still allows you to write platform specific code via those.

hasufell commented 2 years ago

Thanks for the PR! Can I ask you to split it into two commits? One for the refactoring for the extraction to the internal functions and one for the new functions. They can still be in the same PR, I can just more easily filter away the refactoring changes.

I also want to note that another way to do this PR is heavy use of CPP and then include with the appropriate functions imported and maybe ifdefs in function bodies. filepath relies on this heavily, but I wouldn't claim that it makes it more maintainable. Up to you.

hasufell commented 2 years ago

Also which format will fp be here? https://github.com/hasufell/file-io/blob/master/System/File/AbstractFilePath/Windows.hs#L15 I assume a device path? i.e.\\?\?

The format is not specified. Does createFile expect a device path? My tests seem to indicate that regular shorthand paths work as well.

Mistuke commented 2 years ago

Also which format will fp be here? https://github.com/hasufell/file-io/blob/master/System/File/AbstractFilePath/Windows.hs#L15 I assume a device path? i.e.\\?\?

The format is not specified. Does createFile expect a device path? My tests seem to indicate that regular shorthand paths work as well.

It does not expect it, but it accepts it. Without using a device path you'll regress the toolchain's long path support. GHC has internally mapped to device paths when the API supports it for many releases now.

hasufell commented 2 years ago

Also which format will fp be here? https://github.com/hasufell/file-io/blob/master/System/File/AbstractFilePath/Windows.hs#L15 I assume a device path? i.e.\\?\?

The format is not specified. Does createFile expect a device path? My tests seem to indicate that regular shorthand paths work as well.

It does not expect it, but it accepts it. Without using a device path you'll regress the toolchain's long path support. GHC has internally mapped to device paths when the API supports it for many releases now.

Doesn't that cause problems, because we're disabling filepath interpretation? E.g. C:/foo/bar now means something different. So I'm not really sure what is expected here. My gut feeling is that I don't want to mess with user filepaths at all. That was one of the reasons for AFPP, although more about leaving encoding alone (which is not really a problem on windows, because we know the encoding).

Or put another way: is there safe way to convert arbitrary paths to \\?\ paths, emulating the entire windows filepath interpretation layer?


Edit: \\?\C:foo.txt is also different from C:foo.txt. Resolving it to an absolute path will not preserve semantics?

Mistuke commented 2 years ago

Also which format will fp be here? https://github.com/hasufell/file-io/blob/master/System/File/AbstractFilePath/Windows.hs#L15 I assume a device path? i.e.\\?\?

The format is not specified. Does createFile expect a device path? My tests seem to indicate that regular shorthand paths work as well.

It does not expect it, but it accepts it. Without using a device path you'll regress the toolchain's long path support. GHC has internally mapped to device paths when the API supports it for many releases now.

Doesn't that cause problems, because we're disabling filepath interpretation? E.g. C:/foo/bar now means something different. So I'm not really sure what is expected here.

No, because when users use C:/foo/bar they expect it to mean the same thing as C:\foo\bar and we take care of that during the conversion. and on the win32 API layer / and \ mean the exact same thing. Should the user want / to not mean folder then they have to explicitly disable path processing in their input by using \\?\ in which case we don't touch the PATH.

My gut feeling is that I don't want to mess with user filepaths at all. That was one of the reasons for AFPP, although more about leaving encoding alone (which is not really a problem on windows, because we know the encoding).

The reason we do this is so that users can give ANY path to the compiler and have it work. The user shouldn't have to do \\?\ when they want a long path because 99% of users don't know what this is. Neither do scripts or other programs that can produce paths GHC consumes.

We in GHC decided to make this work out of the box because of things like cabal new-build that produce very long normalc:\ paths.

This isn't about AFPP or whatever. You're overriding openFile in an external module and not adhering to what the expectations are. If the user has to input \\?\ to get a long path it also means every tool down the pipeline must understand and interpret this correctly. Configure scripts and friends don't for instance. So we made an implicit decision to convert paths internally but not expose them externally unless the user has explicitly used a \\?\ path.

I don't particularly care what the library does or does not do with it's PATHs. I care about the fact that you're affecting the behavior of the I/O system in an external module that presumably will be part of a boot library.

Or put another way: is there safe way to convert arbitrary paths to \\?\ paths, emulating the entire windows filepath interpretation layer?

Edit: \\?\C:foo.txt is also different from C:foo.txt. Resolving it to an absolute path will not preserve semantics.

Correct. This path is almost non-existent in the real world, Very few people know what it does. Supporting this very arcane path does not warrant regressing the compiler's long path support for the vast majority of users. I'm pretty sure the conversion would fail for that path anyway so we'd use it as is.

hasufell commented 2 years ago

Can you point me to the code where the conversion is done?

Mistuke commented 2 years ago

Can you point me to the code where the conversion is done?

https://github.com/ghc/ghc/blob/0619fb0fb14a98f04aac5f031f6566419fd27495/utils/fs/fs.c#L48

hasufell commented 2 years ago

Also there's a small bug with WriteMode https://github.com/hasufell/file-io/blob/master/System/File/AbstractFilePath/Windows.hs#L27

WriteMode expects the pointer to be at the start of the file, so opening a file in write mode expects the file to be overwritten (where-as append mode appends to the end).

So you have to reset the file size after opening it https://github.com/ghc/ghc/blob/378c0bba7d132a89dd9c35374b7b4bb5a4730bf7/libraries/base/GHC/IO/Windows/Handle.hsc#L856 otherwise the file won't be truncated on write.

I think this is not needed, because we use Win32.cREATE_ALWAYS instead of Win32.oPEN_ALWAYS for WriteMode.

The documentation says:

Creates a new file, always. If the specified file exists and is writable, the function overwrites the file, the function succeeds, and last-error code is set to ERROR_ALREADY_EXISTS (183).

https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-createfilea

From my tests, truncation seems to work.

for AppendMode you also probably want GENERIC_WRITE mode so the user can move the pointer around.

I tried this and it seems to lead to the file pointer moving back to the start of the file, so I don't get append behavior anymore.

Mistuke commented 2 years ago

Also there's a small bug with WriteMode https://github.com/hasufell/file-io/blob/master/System/File/AbstractFilePath/Windows.hs#L27 WriteMode expects the pointer to be at the start of the file, so opening a file in write mode expects the file to be overwritten (where-as append mode appends to the end). So you have to reset the file size after opening it https://github.com/ghc/ghc/blob/378c0bba7d132a89dd9c35374b7b4bb5a4730bf7/libraries/base/GHC/IO/Windows/Handle.hsc#L856 otherwise the file won't be truncated on write.

I think this is not needed, because we use Win32.cREATE_ALWAYS instead of Win32.oPEN_ALWAYS for WriteMode.

The documentation says:

Creates a new file, always. If the specified file exists and is writable, the function overwrites the file, the function succeeds, and last-error code is set to ERROR_ALREADY_EXISTS (183).

Yes but this means that errno is different in your implementation than what Base returns. It may not matter for most use cases but It's not the expected result value for GHC programs. Anything checking GetLastError () will diverge with your implementation.

I also now notice an additional problem... with mio you get the in-process locking code because hANDLEtoHandle eventually calls mkFD which registers the locks, but for winio the helper was added for console handles not files.

Since you're relying on the very latest win32 I can just add file support to hANDLEtoHandle which should work..

https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-createfilea

From my tests, truncation seems to work.

for AppendMode you also probably want GENERIC_WRITE mode so the user can move the pointer around.

I tried this and it seems to lead to the file pointer moving back to the start of the file, so I don't get append behavior anymore.

Hmmm.. are you sure you tested the right thing? access attributes don't influence the disposition. See [2]. The problem here is that the differing attributes change what happens during subsequent calls to CreateFile.

For AppendMode you use FILE_APPEND_DATA, while GENERIC_WRITE is a combination of ALL write attributes including FILE_WRITE_ATTRIBUTES and FILE_WRITE_DATA.

[1] https://docs.microsoft.com/en-us/windows/win32/fileio/file-access-rights-constants [2] https://docs.microsoft.com/en-us/windows/win32/fileio/creating-and-opening-files

hasufell commented 2 years ago

Since you're relying on the very latest win32 I can just add file support to hANDLEtoHandle which should work..

Sounds good.

Hmmm.. are you sure you tested the right thing? access attributes don't influence the disposition. See [2]. The problem here is that the differing attributes change what happens during subsequent calls to CreateFile.

@@ -21,5 +21,5 @@
   accessMode = case iomode of
     ReadMode      -> Win32.gENERIC_READ
     WriteMode     -> Win32.gENERIC_WRITE
-    AppendMode    -> Win32.fILE_APPEND_DATA
+    AppendMode    -> Win32.gENERIC_WRITE .|. Win32.fILE_APPEND_DATA
     ReadWriteMode -> Win32.gENERIC_READ .|. Win32.gENERIC_WRITE

What it now does is write to the file from the start, without truncating. If I revert the patch, appending works again.

*System.File.AbstractFilePath> readFile "lol.txt"
"tzaaakk"
*System.File.AbstractFilePath> appendFile "lol.txt" "123"
*System.File.AbstractFilePath> readFile "lol.txt"
"123aakk"
hasufell commented 2 years ago

So, I think Win32.gENERIC_READ .|. Win32.fILE_APPEND_DATA works and https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-setfilepointer indicates that gENERIC_READ might be enough to set the file pointer.


The alternative is:

--- a/System/File/AbstractFilePath/Windows.hs
+++ b/System/File/AbstractFilePath/Windows.hs
@@ -8,6 +8,7 @@ import System.AbstractFilePath.Windows ( WindowsFilePath )

 import qualified System.Win32 as Win32
 import qualified System.Win32.WindowsString.File as WS
+import Control.Monad (when)
 #if defined(__IO_MANAGER_WINIO__)
 import GHC.IO.SubSystem
 #endif
@@ -31,12 +32,18 @@ openFile fp iomode = bracketOnError
 #endif
       Nothing)
     Win32.closeHandle
-    Win32.hANDLEToHandle
+    toHandle
  where
+  toHandle h = do
+    when (iomode == AppendMode ) $ do
+       r <- Win32.setFilePointerEx  h 0 Win32.fILE_END
+       err <- Win32.getLastError
+       when (r == 4294967295 && err /= 0) $ Win32.failWith "openFile" err
+    Win32.hANDLEToHandle  h
   accessMode = case iomode of
     ReadMode      -> Win32.gENERIC_READ
     WriteMode     -> Win32.gENERIC_WRITE
-    AppendMode    -> Win32.fILE_APPEND_DATA
+    AppendMode    -> Win32.gENERIC_WRITE .|. Win32.fILE_APPEND_DATA
     ReadWriteMode -> Win32.gENERIC_READ .|. Win32.gENERIC_WRITE

   createMode = case iomode of
hasufell commented 2 years ago

ping

Mistuke commented 2 years ago

Ah sorry,

The alternative is:

I prefer the alternative as that's what we currently do, I put up https://github.com/haskell/win32/pull/202 to fix the API so you can clean up the code. setFilePointerEx like the other Win32 calls throws an exception, so you should just catch it and close the handle.

I will merge https://github.com/haskell/win32/pull/202 after work today.

Mistuke commented 2 years ago
+  toHandle h = do
+    when (iomode == AppendMode ) $ do
+       r <- Win32.setFilePointerEx  h 0 Win32.fILE_END
+       err <- Win32.getLastError
+       when (r == 4294967295 && err /= 0) $ Win32.failWith "openFile" err
+    Win32.hANDLEToHandle  h

Just realized that the API was already correct, you don't need this, on failure the API raises an exception. You just need a finally here to always close the handle.

hasufell commented 2 years ago

https://github.com/hasufell/file-io/commit/b33d31c1a8010aeecf297406051a00d9a5c0f376

Mistuke commented 2 years ago

Cheers, did the dependent changes on this get merged get so the CI passes? If so I can merge this one.

hasufell commented 2 years ago

Cheers, did the dependent changes on this get merged get so the CI passes? If so I can merge this one.

I'll add one more test suite to filepath that proves that old and new versions are equivalent. Maybe it's overkill, but I like to sleep well. Then this can be merged.

I don't know which GHC version will be the first to depend on filepath-2.0 (maybe 9.6 or later), so you maybe want to maintain two branches for Win32?

hasufell commented 2 years ago

Merged and published as a package candidate: https://hackage.haskell.org/package/filepath-2.0.0.0/candidate

I added the candidate .tgz manually to the cabal.project file and made sure that appveyor.yml doesn't overwrite it. But I can't see the CI logs.

Mistuke commented 2 years ago

Oh missed the last message, will sort out the CI failure and merge this weekend if you're happy with the changes @hasufell ?

hasufell commented 2 years ago

Oh missed the last message, will sort out the CI failure and merge this weekend if you're happy with the changes @hasufell ?

Yeah... but note that I haven't released the new filepath version yet. There's some minor things I'm still sorting out, so no rush. API won't really change though.

Mistuke commented 2 years ago

cool just let me know :)

hasufell commented 2 years ago

@Mistuke just a heads-up... I want to release the new filepath in 1-2 weeks. If you're planning a release, maybe we should synchronize. I think the idea was that this makes it into GHC-9.6.

Mistuke commented 2 years ago

OK, CI should be fixed now, next commit should trigger the internal and external one.

System\Win32\SymbolicLink\Internal.hsc:13:1: error: [-Wunused-imports, -Werror=unused-imports]
    The import of `Data.Bits' is redundant
      except perhaps to import instances from `Data.Bits'
    To import instances alone, use: import Data.Bits()
   |
13 | import Data.Bits ((.|.))
   | ^^^^^^^^^^^^^^^^^^^^^^^^

System\Win32\SymbolicLink\Internal.hsc:15:1: error: [-Wunused-imports, -Werror=unused-imports]
    The import of `System.Win32.File' is redundant
      except perhaps to import instances from `System.Win32.File'
    To import instances alone, use: import System.Win32.File()
   |
15 | import System.Win32.File ( failIfFalseWithRetry_ )

It's showing some unused imports, could you check the build with -Werror enabled?

Mistuke commented 2 years ago

trying to retrigger CI

Mistuke commented 2 years ago

alright, the external CI now shows the same error. (see Appveyor)

hasufell commented 2 years ago

The CI fails, because both filepath and bytestring have a lower bound on base >=4.9 and that's unlikely to change. ghc-8.0.2 is really the oldest we care about. I can't even install ghc-7.6.3 on linux.

Mistuke commented 2 years ago

The CI fails, because both filepath and bytestring have a lower bound on base >=4.9 and that's unlikely to change. ghc-8.0.2 is really the oldest we care about. I can't even install ghc-7.6.3 on linux.

Well, Win32 doesn't need to run on Linux :) Please put the exports for the FPP stuff behind an

 if impl(ghc >= 8.0)

as I don't want to lose compatibility with older GHCs, we've worked very hard to maintain it.

hasufell commented 2 years ago

The CI fails here, because cabal build also pulls filepath into the resolution, even if it's not needed, because it's listed in cabal.project as source-repository package.

There are two known workarounds:

  1. use cabal pre-release which allos ghc conditionals in cabal.project
  2. use multiple cabal.project files and select them manually in command line by checking ghc --version or so

I'm not sure how to fix the private CI.

Mistuke commented 2 years ago

We already had a dependency on filepath before, could we just add the constrain on the version of filepath needee under the FFp path?

hasufell commented 2 years ago

We already had a dependency on filepath before, could we just add the constrain on the version of filepath needee under the FFp path?

This is not about Win32.cabal. It's about cabal.project. The cabal ticket is here https://github.com/haskell/cabal/issues/5444

Whether you use source-repository-package or packages results in the same issue.

Mistuke commented 2 years ago

Ahh ok. I Suppose we only need that because filepath hasn't been released yet?

So should it go green once it is?

Sent from my Mobile

On Thu, Jul 14, 2022, 10:42 Julian Ospald @.***> wrote:

We already had a dependency on filepath before, could we just add the constrain on the version of filepath needee under the FFp path?

This is not about Win32.cabal. It's about cabal.project. The cabal ticket is here haskell/cabal#5444 https://github.com/haskell/cabal/issues/5444

Whether you use source-repository-package or packages results in the same issue.

— Reply to this email directly, view it on GitHub https://github.com/haskell/win32/pull/198#issuecomment-1184228270, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAI7OKIWLMYS4IAPUQU2AI3VT7OHTANCNFSM5RR7C4KA . You are receiving this because you modified the open/close state.Message ID: @.***>

hasufell commented 2 years ago

So should it go green once it is?

That depends on whether this is the only issue with this PR. I can't tell, because I don't know how to install these ancient GHCs.

Mistuke commented 2 years ago

Can't you add --allow-newer to allow cabal to ignore the upper bounds to test on the CI?

Sent from my Mobile

On Thu, Jul 14, 2022, 10:55 Julian Ospald @.***> wrote:

So should it go green once it is?

That depends on whether this is the only issue with this PR. I can't tell, because I don't know how to install these ancient GHCs.

— Reply to this email directly, view it on GitHub https://github.com/haskell/win32/pull/198#issuecomment-1184241786, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAI7OKOL36YIJL37GGQ3MVDVT7PY7ANCNFSM5RR7C4KA . You are receiving this because you modified the open/close state.Message ID: @.***>

hasufell commented 2 years ago

Can't you add --allow-newer to allow cabal to ignore the upper bounds to test on the CI?

Doubt so, because base is not reinstallable.

hasufell commented 2 years ago

@Mistuke the private CI fails and I think this is not a sustainable workflow for contributors. I can't see the logs and don't wanna invest more time in blind-folded CI fixing. The public CI succeeds, feel free to adjust this PR.

hasufell commented 2 years ago

@Mistuke is this making a hackage release?

Mistuke commented 2 years ago

@Mistuke is this making a hackage release?

Yes https://hackage.haskell.org/package/Win32 I'll fix the docs later when I feel a bit better.

hasufell commented 2 years ago

Thanks. Did you catch something in the amazonas?

Mistuke commented 2 years ago

Thanks. Did you catch something in the amazonas?

haha, luckily no, just exhausted.