Closed NorfairKing closed 2 years ago
I'm not qualified to say, but those parsings seem to match what's said here:
[3](https://datatracker.ietf.org/doc/html/rfc3986#section-3). Syntax Components
The generic URI syntax consists of a hierarchical sequence of
components referred to as the scheme, authority, path, query, and
fragment.
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
hier-part = "//" authority path-abempty
/ path-absolute
/ path-rootless
/ path-empty
The following are two example URIs and their component parts:
foo://example.com:8042/over/there?name=ferret#nose
\_/ \______________/\_________/ \_________/ \__/
| | | | |
scheme authority path query fragment
| _____________________|__
/ \ / \
urn:example:animal:ferret:nose
[3.3](https://datatracker.ietf.org/doc/html/rfc3986#section-3.3). Path
The path component contains data, usually organized in hierarchical
form, that, along with data in the non-hierarchical query component
([Section 3.4](https://datatracker.ietf.org/doc/html/rfc3986#section-3.4)), serves to identify a resource within the scope of the
URI's scheme and naming authority (if any). The path is terminated
by the first question mark ("?") or number sign ("#") character, or
by the end of the URI.
Huh I expected this:
mailto:John.Doe@example.com
\____/ \______/ \_________/ \___/
scheme userinfo regname no path
\_______________/
authority
news:comp.infosystems.www.servers.unix
\__/ \_______________________________/ \___/
scheme regname no path
Instead of what is currently parsed:
mailto:John.Doe@example.com
\____/ \__________________/
scheme path
news:comp.infosystems.www.servers.unix
\__/ \_______________________________/
scheme path
@chreekat Is right; these parsings are counterintuitive but are made clear in the spec. There is a grammar and a bit of text to this effect:
hier-part = "//" authority path-abempty
/ path-absolute
/ path-rootless
/ path-empty
When authority is present, the path must either be empty or begin with a slash ("/") character. When authority is not present, the path cannot begin with two slash characters ("//")
I did some digging to try to justify this choice in the spec, although honestly I'm at a loss. I do think the email address in a mailto:
URI is somewhat different in nature from an authority component, even though the two look similar.
All that said, I would like to add some test cases that concretely demonstrate the expected parsings and cite the spec alongside. I'll do that shortly.
OK, a few more test cases have been added demonstrating this behavior. Thanks for prompting it!
The spec has these examples of URIs:
Some of them are parsed incorrectly (I think):