haskell / filepath

Haskell FilePath core library
BSD 3-Clause "New" or "Revised" License
66 stars 32 forks source link

isValid "\\\\?\\UNC\\" #191

Open Bodigrim opened 1 year ago

Bodigrim commented 1 year ago
> System.FilePath.Windows.isValid "\\\\?\\UNC\\"
True
> putStrLn "\\\\?\\UNC\\"
\\?\UNC\

I think this is wrong: \\?\UNC\ is incomplete, it is nether file nor folder name.

https://github.com/haskell/filepath/blob/98f8bba9eac8c7183143d290d319be7df76c258b/System/FilePath/Internal.hs#L1065-L1067

If we are in agreement that isValid should return False on this input, there is a harder question ahead. What should be the output of makeValid? Something like \\?\UNC\_\_?

hasufell commented 1 year ago

Related: https://github.com/haskell/filepath/issues/92

isValid is a hot mess on windows.

I'm not sure how much improvement we can drive here with ad-hoc bugfixes.

The underlying problem is that we're not parsing windows filepaths, although there are pieces that allow us to put together a proper grammar:

With that we could implement a more meaningful version of isValid.

hasufell commented 1 year ago

https://github.com/haskell/filepath/blob/16e5374620189b27eca1eed09642ec02b2222fc8/System/FilePath/Internal/Parser.hs#L26-L61

Bodigrim commented 1 year ago

I'm not sure how much improvement we can drive here with ad-hoc bugfixes.

I agree. My bigger concern is that while at least in theory isValid could be made correct, makeValid is fundamentally broken on Windows. It's not like you can meaningfully repair any Windows path at all. Even current behaviour makeValid "test*" == "test_" is a bit of WAAAAT? Maybe mark it as deprecated?..

hasufell commented 1 year ago

Ok, so things are a little more complicated on windows wrt "\\\\?\\UNC\\".

These are not statically assigned special names afaiu. Instead those are some form of object symlinks that are maintained inside of windows (and can be viewed in the WinObj browser tool). Also see: https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file#nt-namespaces

There are many more, e.g. look at:

\\?\UNC\localhost\c$\foo\bar                       -> \\localhost\c$\foo\bar
\\?\GLOBALROOT\GLOBAL??\UNC\localhost\c$\foo\bar   -> \\localhost\c$\foo\bar
\\?\HarddiskVolume2\foo\bar                        -> C:\foo\bar (if HarddiskVolume2 is C:)
\\?\GLOBALROOT\GLOBAL??\HarddiskVolume2\foo\bar    -> C:\foo\bar (if HarddiskVolume2 is C:)
\\?\GLOBALROOT\Device\Harddisk0\Partition2\foo\bar -> C:\foo\bar (if Harddisk0\Partition2 is C:)

(all the above are somewhat equal)

The fact that filepath as a library treats \\\\?\\UNC\\ special is in my opinion more of a wart than a feature. I don't consider \\\\?\\UNC\\ a special case in my grammar. The meaning of those object links can only fully be understood when performing IO. Some of them may be somewhat conventional, but still...

Maybe @Mistuke has another opinion.

Bodigrim commented 1 year ago

AFAIU https://learn.microsoft.com/en-us/dotnet/standard/io/file-path-formats#dos-device-paths, \\?\UNC\ is a special case. Namely, Windows filenames can be:

Now there is a bit of confusion. If you want to format a traditional DOS path as a device path, you can just append \\.\ to C:\foo\bar, obtaining \\.\C:\foo\bar. The same does not apply for UNC paths to shared drives, because you end up with \\.\\server\share\file and device paths are not supposed to contain \\ anywhere except the beginning. To overcome this restriction Windows introduces a workaround: instead of \\.\\server\share\file you are supposed to write \\.\UNC\server\share\file. So this is a special syntax.

hasufell commented 1 year ago

So this is a special syntax.

It's not syntax, those are simply symbolic links. Again, there's also \\?\GLOBALROOT\GLOBAL??\UNC ...why we don't support that form? We can even do \\?\\GLOBALROOT\Device\Mup\localhost\c$\foo\bar.

UNC

Mistuke commented 1 year ago

The fact that filepath as a library treats \\?\UNC\ special is in my opinion more of a wart than a feature. I don't consider \\?\UNC\ a special case in my grammar. The meaning of those object links can only fully be understood when performing IO. Some of them may be somewhat conventional, but still...

FWIW I agree, Inside GHC's handling we only really treat \\?\ and \\.\ as special.