Open mladedav opened 7 months ago
I wonder if it would be possible to instead do partial percent decoding before routing - only translating unreserved (pointlessly) percent-encoded characters to their regular character.
Notably, this avoids any issues with %
signs that are themselves percent-encoded, because these will only be decoded in the second and last decoding step, as part of path parameter deserialization.
The percent_encoding
crate does not offer an API that allows partial decoding, but it should be fairly simple to cherry-pick the relevant things.
W.r.t. what we allow in .route()
calls, I haven't thought about it much yet, but I think we can be relatively strict there. I definitely want it to be possible to have non-/
separators in the future. Maybe we should only support reserved characters for that? It would be a little sad to not allow separating by .
, but not the end of the world... Would be useful to collect real use-cases.
We could decode just the unreserved characters, but then I think we should also encode reserved characters (like ?
) in route()
and similar calls. Otherwise, matching /*path
or /path?
or /path%
is impossible (unless the user encodes the special characters by hand). And the question of how to handle {
, "
, and }
would still stand since these should be reserved but hyper allows them.
Regarding separators inside path segments, I assumed that this would be something that would come from matchit, I know there were some issues about matching based on extensions and such and I believe the change in its grammar was in part to be able to support things like that later. I'm not exactly sure how this is consideration here but it's been some time since I've seen this.
Or we can just decode the unreserved characters and ignore the rest for now of course.
Maybe just forbid *
, ?
and %
verbatim in route
calls? (though would need to check for percent-encoded %
as a special case)
Not sure about the rest of your comment, I also don't have the entire context rn.
Feature Request
Motivation
URIs can be percent-encoded but that should not change what resource is accessed.
More specifically, RFC3986 says:
There are also some characters that are sometimes encoded and sometimes not in the real world, e.g. hyper will probably have handling of
{
,}
, and"
in paths configurable.Proposal
We can unescape unreserved characters inside the path, which can be decoded at any time. E.g. requests
, etc. to normalize these before routing).
/axum
and/%61xum
can be interpreted as the same one. We can also normalize reserved characters by percent-encoding them, if they are in the path (with some exceptions that already have special meaning like%
,?
,#
, and so on; we can however encode{
,"
,We can also encode special characters when registering a route such as internally turning current
.route("/100%",..)
into.route("/100%25", ...)
(%25
is percent encoding for%
). Special case of this is that a user can currently write.route("/what?",...)
which can never match any request.With these two changes users can use special characters in route like
r#"/"qoutes"/etc"#
and match it with both encdoed and not encoded variants.Alternatives
Using a middleware before
axum::Router
that decodes (or rather otherwise normalizes) the percent-encoding. Not all percent-decoded paths are valid paths in URI so it would most likely have to percent-decode unreserved characters and percent-encode reserved characters to normalize.This can be combined with encoding special characters when they are to be registered to
matchit
(e.g. braces, percent, question mark,...) either by providing something likeroute_encoded
or by having the user encode any needed characters themselves.I mention this primarily in case we would want to support people who want to have control over percent decoding themselves, but I think this can be also just built inside
axum
itself.