jetty / jetty.project

Eclipse Jetty® - Web Container & Clients - supports HTTP/2, HTTP/1.1, HTTP/1.0, websocket, servlets, and more
https://eclipse.dev/jetty
Other
3.86k stars 1.91k forks source link

Accept empty URI paths #12404 #12425

Closed gregw closed 2 weeks ago

gregw commented 3 weeks ago

Fix #12404 by accepting empty paths in HttpURI

gregw commented 3 weeks ago

@joakime actually this is more complex than I thought. It raises security issues as there are more ways a bad username can confuse the parser. Also we have the problem of URIs like "//foo"! Is that a host of "foo" and an empty path, or a path with an empty segment of "//foo".

We need to carefully consider this, so I have move to the next months release.

joakime commented 3 weeks ago

Also we have the problem of URIs like "//foo"! Is that a host of "foo" and an empty path, or a path with an empty segment of "//foo".

Eesh. Meanwhile, I'll work on coming up with more unit tests for these kinds of edge cases. I'll check what Java's RFC parser does and other projects that use the RFC parsing rules too, see what they do.

joakime commented 3 weeks ago

@gregw I added a bunch more tests for HttpURITest in branch fix/12.0.x/more-uri-testing

First batch can be seen in commit f693bf92fa830da127cb0c3c9dd7c7d4ef328830

gregw commented 3 weeks ago

@joakime OK, reading the RFC a bit more, I can see:

   path          = path-abempty    ; begins with "/" or is empty
                 / path-absolute   ; begins with "/" but not "//"
                 / path-noscheme   ; begins with a non-colon segment
                 / path-rootless   ; begins with a segment
                 / path-empty      ; zero characters

   path-abempty  = *( "/" segment )
   path-absolute = "/" [ segment-nz *( "/" segment ) ]
   path-noscheme = segment-nz-nc *( "/" segment )
   path-rootless = segment-nz *( "/" segment )
   path-empty    = 0<pchar>

So this clearly says that an absolute path cannot start with "//", so "//foo" is always an authority and never a path.

I will update based on this.

gregw commented 3 weeks ago

Dang!!!! no so simple. The JVM URI class does this:

jshell> URI uri = new URI("http://localhost;param");
uri ==> http://localhost;param

jshell> uri.getHost()
$2 ==> null

jshell> uri.getPath()
$3 ==> ""

jshell> uri.getRawSchemeSpecificPart()
$4 ==> "//localhost;param"

jshell> uri.getRawPath()
$5 ==> ""

So in this case, the presence of the ; is making it think that "localhost" is not an authority?!?!? This is even more complicated, as I believe that whatwg would allow ; as a character of the hostname, so they would get "localhost;param" as the authority !?!?!?

What a mess!

joakime commented 3 weeks ago

That's still an absolute (non-opaque) URI. It's an authority still, but not parseable any further, into a userinfo, or host, or port.

jshell> var u = new URI("http://localhost;param");
u ==> http://localhost;param

jshell> u.isOpaque()
$2 ==> false

jshell> u.isAbsolute()
$3 ==> true

jshell> u.getAuthority()
$4 ==> "localhost;param"

jshell> u.getUserInfo()
$5 ==> null

jshell> u.getHost()
$6 ==> null

Interestingly, the WhatWG parser in Firefox shows this differently.

>> new URL("http://localhost;param");
< URL { 
  href: "http://localhost;param/", 
  origin: "http://localhost;param/", 
  protocol: "http:", 
  username: "", 
  password: "", 
  host: "localhost;param", 
  hostname: "localhost;param", 
  port: "", 
  pathname: "/", 
  search: "" }

And chrome shows ...

> new URL("http://localhost;param")
< URL {origin: 'http://localhost;param', protocol: 'http:', username: '', password: '', host: 'localhost;param', …}
hash: ""
host: "localhost;param"
hostname: "localhost;param"
href: "http://localhost;param/"
origin: "http://localhost;param"
password: ""
pathname: "/"
port: ""
protocol: "http:"
search: ""
searchParams: URLSearchParams {size: 0}
username: ""
[[Prototype]]: URL

I think that for this particular corner case it doesn't matter, as a host of "localhost;param" is still in violation of hostname character rules and shouldn't work when trying to resolve it in DNS.

joakime commented 3 weeks ago

Actually, my last statement about the name would fail hostname requirements isn't right.

The original dns/hostname spec limits the characters to a subset of ASCII ( a-z or A-Z or 0-9 or - or . )

https://www.rfc-editor.org/rfc/rfc1034

Since the introduction of international host names, far more characters are now allowed in hostnames, most unicode, and escaped octets to show a few examples. The only reserved special character is the . to separate segments of the hostname.

See

Yeah, so I think we should just follow the RFC URI parsing spec here. it's an authority, but that's about it, we cannot make the call if that segment is a user info or host or port.

Just for completion, the java.net.URI doesn't use the ; character to start the path, only the / character.

jshell> var u = new URI("http://localhost;param/path")
u ==> http://localhost;param/path

jshell> u.getAuthority()
$2 ==> "localhost;param"

jshell> u.getHost()
$3 ==> null

jshell> u.getPath()
$4 ==> "/path"
gregw commented 3 weeks ago

@joakime for now, I think we should just treat "http://localhost;param/path" as a bad authority, as there are different interpretations out there.