Closed gregw closed 2 weeks ago
@joakime actually this is more complex than I thought. It raises security issues as there are more ways a bad username can confuse the parser. Also we have the problem of URIs like "//foo"! Is that a host of "foo" and an empty path, or a path with an empty segment of "//foo".
We need to carefully consider this, so I have move to the next months release.
Also we have the problem of URIs like "//foo"! Is that a host of "foo" and an empty path, or a path with an empty segment of "//foo".
Eesh. Meanwhile, I'll work on coming up with more unit tests for these kinds of edge cases. I'll check what Java's RFC parser does and other projects that use the RFC parsing rules too, see what they do.
@gregw I added a bunch more tests for HttpURITest
in branch fix/12.0.x/more-uri-testing
First batch can be seen in commit f693bf92fa830da127cb0c3c9dd7c7d4ef328830
@joakime OK, reading the RFC a bit more, I can see:
path = path-abempty ; begins with "/" or is empty
/ path-absolute ; begins with "/" but not "//"
/ path-noscheme ; begins with a non-colon segment
/ path-rootless ; begins with a segment
/ path-empty ; zero characters
path-abempty = *( "/" segment )
path-absolute = "/" [ segment-nz *( "/" segment ) ]
path-noscheme = segment-nz-nc *( "/" segment )
path-rootless = segment-nz *( "/" segment )
path-empty = 0<pchar>
So this clearly says that an absolute path cannot start with "//", so "//foo" is always an authority and never a path.
I will update based on this.
Dang!!!! no so simple. The JVM URI class does this:
jshell> URI uri = new URI("http://localhost;param");
uri ==> http://localhost;param
jshell> uri.getHost()
$2 ==> null
jshell> uri.getPath()
$3 ==> ""
jshell> uri.getRawSchemeSpecificPart()
$4 ==> "//localhost;param"
jshell> uri.getRawPath()
$5 ==> ""
So in this case, the presence of the ; is making it think that "localhost" is not an authority?!?!? This is even more complicated, as I believe that whatwg would allow ; as a character of the hostname, so they would get "localhost;param" as the authority !?!?!?
What a mess!
That's still an absolute (non-opaque) URI. It's an authority still, but not parseable any further, into a userinfo, or host, or port.
jshell> var u = new URI("http://localhost;param");
u ==> http://localhost;param
jshell> u.isOpaque()
$2 ==> false
jshell> u.isAbsolute()
$3 ==> true
jshell> u.getAuthority()
$4 ==> "localhost;param"
jshell> u.getUserInfo()
$5 ==> null
jshell> u.getHost()
$6 ==> null
Interestingly, the WhatWG parser in Firefox shows this differently.
>> new URL("http://localhost;param");
< URL {
href: "http://localhost;param/",
origin: "http://localhost;param/",
protocol: "http:",
username: "",
password: "",
host: "localhost;param",
hostname: "localhost;param",
port: "",
pathname: "/",
search: "" }
And chrome shows ...
> new URL("http://localhost;param")
< URL {origin: 'http://localhost;param', protocol: 'http:', username: '', password: '', host: 'localhost;param', …}
hash: ""
host: "localhost;param"
hostname: "localhost;param"
href: "http://localhost;param/"
origin: "http://localhost;param"
password: ""
pathname: "/"
port: ""
protocol: "http:"
search: ""
searchParams: URLSearchParams {size: 0}
username: ""
[[Prototype]]: URL
I think that for this particular corner case it doesn't matter, as a host of "localhost;param"
is still in violation of hostname character rules and shouldn't work when trying to resolve it in DNS.
Actually, my last statement about the name would fail hostname requirements isn't right.
The original dns/hostname spec limits the characters to a subset of ASCII ( a-z
or A-Z
or 0-9
or -
or .
)
https://www.rfc-editor.org/rfc/rfc1034
Since the introduction of international host names, far more characters are now allowed in hostnames, most unicode, and escaped octets to show a few examples. The only reserved special character is the .
to separate segments of the hostname.
See
Yeah, so I think we should just follow the RFC URI parsing spec here. it's an authority, but that's about it, we cannot make the call if that segment is a user info or host or port.
Just for completion, the java.net.URI doesn't use the ;
character to start the path, only the /
character.
jshell> var u = new URI("http://localhost;param/path")
u ==> http://localhost;param/path
jshell> u.getAuthority()
$2 ==> "localhost;param"
jshell> u.getHost()
$3 ==> null
jshell> u.getPath()
$4 ==> "/path"
@joakime for now, I think we should just treat "http://localhost;param/path" as a bad authority, as there are different interpretations out there.
Fix #12404 by accepting empty paths in HttpURI