mirage / ocaml-uri

RFC3986 URI parsing library for OCaml
Other
97 stars 57 forks source link

IPv6 addresses in square brackets are parsed incorrectly #163

Closed dmbaturin closed 11 months ago

dmbaturin commented 1 year ago

"Connect over HTTPS to port 8443 on 2001:db8:ff::1" is https://[2001:db:ff::1]:8443/....

Uri.of_string, however, completely misunderstands it and thinks the URI has no authority part at all.

utop # let u = Uri.of_string "https://[2001:db8:ff::1]:8443/~user/page.html" ;;
val u : Uri.t = https:///%5B2001:db8:ff::1%5D:8443/~user/page.html

utop # Uri.path u ;;
- : string = "%5B2001:db8:ff::1%5D:8443/~user/page.html"

utop # Uri.host u ;;
- : string option = Some ""
edwintorok commented 1 year ago

See also https://datatracker.ietf.org/doc/html/rfc3986, '[]' is meant to be used for IPv6 addresses, not uri-encoding

dmbaturin commented 1 year ago

'[]' is meant to be used for IPv6 addresses, not uri-encoding

Excuse me, I don't understand your point.

Let's look at section 3.2.2 (Host):

A host identified by an Internet Protocol literal address, version 6
   [[RFC3513](https://datatracker.ietf.org/doc/html/rfc3513)] or later, is distinguished by enclosing the IP literal
   within square brackets ("[" and "]").

Now, let's look at the README. It says:

Uri -- an RFC3986 URI/URL parsing library

Thus, I'd expect it to parse URLs that comply with the RFC correctly.

https://[2001:db8:ff::1]:8443/~user/page.html is a valid URL. Its host part is an IPv6 address literal, enclosed in square brackets as per the RFC (to deal with the fact that IPv6 addresses use colons to separate two-octet groups when URLs historically also use colons to separate the host part from the port.

Now, please pay attention to this part of the output of my bug report:

utop # Uri.host u ;;
- : string option = Some ""
-

The host part of the URL in question is 2001:db8:ff::1. Your library thinks that there is no host part in that URL.

Could you please elaborate how your comment relates to the issue on hand?

psafont commented 1 year ago

The comment is about the root cause of the issue that makes the library to say that the URL doesn't have a host part:

# Uri.make ~host:"[2a01:240:ab08:4:30de:12ff:fe96:ade6]" ~path:"/services" ();;
- : Uri.t =
//%5B2a01%3A240%3Aab08%3A4%3A30de%3A12ff%3Afe96%3Aade6%5D/services

It's because the square brackets are uri-encoded.

In any case there's a PR open to fix this, let's hope it gets merged soon

dmbaturin commented 1 year ago

@psafont Ah, I see, I just misinterpreted @edwintorok's wording as one of those comments where the commenter completely misses the point of the issue.

I still don't agree with the wording, though. ;)

The root cause is that the parser was missing the square-bracketed IPv6 literal case, as line 950 shows. The rest is consequences.