mirage / ocaml-uri

RFC3986 URI parsing library for OCaml
Other
97 stars 57 forks source link

Consider Including Drive Letter in Uri.path [Windows] #160

Open EmmaJaneBonestell opened 2 years ago

EmmaJaneBonestell commented 2 years ago

While RFC 3986 does not specify how to handle drive letters, the majority of implementations include the drive letter (e.g. C:\ , D:\, ...) on Windows, and so omitting it does not really match up with real-world behavior or applications. Newer RFCs/standards also use this behavior, e.g. RFC 8089 , WhatWG .

Without it, it is impossible to use solely Uri.path to return a path on the non-current drive.

Examples:

Printf.printf "%s \n" (Uri.pct_decode(Uri.path(Uri.of_string("C:\\Program Files (x86)\\"))))
Printf.printf "%s \n" (Uri.pct_decode(Uri.path(Uri.of_string("D:\\Program Files (x86)\\"))))

Both return:

\Program Files (x86)\

which only gives a path on the current drive, while, I feel, a more appropriate result would be:

C:\Program Files (x86)\
D:\Program Files (x86)\

Inconsistently, in these basic examples, using Uri.to_string (instead of Uri.path) will return with the drive letter.

avsm commented 1 year ago

Thanks for the idea @EmmaJaneBonestell. I'm not too familiar with desirable Windows behaviour, and am paging in @dra27 @jonahbeckford to see if they have any opinions on this. Conforming to RFC 8089 seems sensible to me at any rate.

jonahbeckford commented 1 year ago

The Microsoft System.Uri class claims to adhere to RFC 3986 (and RFC 2396, RFC 2732, and RFC 3987): https://learn.microsoft.com/en-us/dotnet/api/system.uri?view=net-7.0

You can see the result in PowerShell:

C:\temp> New-Object System.Uri("C:\Program Files (x86)\")

AbsolutePath   : C:/Program%20Files%20(x86)/
AbsoluteUri    : file:///C:/Program%20Files%20(x86)/
LocalPath      : C:\Program Files (x86)\
Authority      :
HostNameType   : Basic
IsDefaultPort  : True
IsFile         : True
IsLoopback     : True
PathAndQuery   : C:/Program%20Files%20(x86)/
Segments       : {/, C:/, Program%20Files%20(x86)/}
IsUnc          : False
Host           :
Port           : -1
Query          :
Fragment       :
Scheme         : file
OriginalString : C:\Program Files (x86)\
DnsSafeHost    :
IdnHost        :
IsAbsoluteUri  : True
UserEscaped    : False
UserInfo       :

So yes, including C: as part of the PathAndQuery seems the right thing to do.

But I think Microsoft is right in that the path should not have backslashes in it. Is that what Uri.Path is returning today? https://www.ietf.org/rfc/rfc2396.txt Section 3 (Page 10) is clear on (forward) slashes:

URI that are hierarchical in nature use the slash "/" character for separating hierarchical components.

and it labels as "unwise" the use of backslash higher on the same Page 10:

Other characters are excluded because gateways and other transport agents are known to sometimes modify such characters, or they are used as delimiters.

  unwise      = "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`"

Is it too much to ask for a fixup of forward slashes to get proper Windows URI compliance?

ejgallego commented 11 months ago

Actually I'm seeing a different problem; the applications I'm interating with do send the URI properly encoded file:///c:/Windows/foo.dll for example.

However, Uri.path will return /c:/Windows/foo.dll which is not a valid Windows path, note that c:/Windows/foo.dll is.

Thus I'm unsure how this should work, but for now I have to post-process all my Uri.path results on Windows if they contain a drive letter as to remove the starting slash.

ejgallego commented 11 months ago

VsCode for example does this massaging here:

https://github.com/microsoft/vscode/blob/22ca160cbd4eea4afb0d595aff1cbb072cf3627f/src/vs/base/common/uri.ts#L623