Open spaceone opened 5 years ago
This is related to #89. The rules for urlencoding are complicated and it's not correct to simply urldecode the whole path. For example, the urls http://example.com/foo/bar
and http://example.com/foo%2Fbarare two distinct urls (a relative URL
baz.pngwould resolve to
http://example.com/foo/baz.pngin the former and
http://example.com/baz.png` in the latter).
According to RFC 3987 (IRIs), =
is a reserved character and should be left in its urlencoded form. The WHATWG URL spec appears to disagree (=
must be percent-encoded in a userinfo context but not in a path context).
I agree it's annoying to see percent-encoded values in the routing layer (especially for non-ASCII characters), but the fix is not straightforward so it may be best to avoid using characters like =
whose encoding is complex or ambiguous. Or you could implement your own variant of PathMatches that operates on the decoded form if you don't care about the more esoteric special cases that can arise.
https://datatracker.ietf.org/doc/html/rfc3987#section-5.3.2.3
says that at least the unreserved
(ALPHA / DIGIT / "-" / "." / "_" / "~"
) characters should be decoded when comparing.
I think this is not even done by Tornado.
Hm, I think (not sure yet) I would argue that decoding should also be done for the reserved
(":" / "/" / "?" / "#" / "[" / "]" / "@" "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
) characters if the URI components are already splitted (which is the case here, as we look only at the path
component?!). But this is not explicitly stated in the RFC.
Otherwise a path foo%7Ebar
would equal foo~bar
but foo!bar
would not equal foo%21bar
and the equality offoo>bar
and foo%3Ebar
is not even defined.
says that at least the unreserved (ALPHA / DIGIT / "-" / "." / "_" / "~") characters should be decoded when comparing. I think this is not even done by Tornado.
Correct. But it's never mattered because these characters are never encoded in the first place (with the exception of tilde, which I'm pretty sure was considered reserved in older specs).
if the URI components are already splitted (which is the case here, as we look only at the path component?!)
That's an appealing argument, although I think the way that relative URI processing treats "/" and "%2F" differently is a counterexample.
self.request.path
is the raw urlencoded path from the HTTP start line (i.e. it doesn't even normalize anything and takes whatever the client gives). The routerPathMatches
uses this to match the path against this.Having a route definition like:
fails for
/cn%3Dfoo/
and/cn%3dfoo/
but not for/cn=foo/
.A workaround is:
In general I think it's wrong that request.path contains the urlencoded form of the path. It should be decoded. But it's probably an issue for backwards compatibility? At least the routing component could decode the path before applying it to the routes!