Routing percent-encoded paths

mladedav commented 7 months ago

[x] I have looked for existing issues (including closed) about this

Feature Request

Motivation

URIs can be percent-encoded but that should not change what resource is accessed.

More specifically, RFC3986 says:

URIs that differ in the replacement of an unreserved character with its corresponding percent-encoded US-ASCII octet are equivalent: they identify the same resource.

There are also some characters that are sometimes encoded and sometimes not in the real world, e.g. hyper will probably have handling of {, }, and " in paths configurable.

Proposal

We can unescape unreserved characters inside the path, which can be decoded at any time. E.g. requests /axum and /%61xum can be interpreted as the same one. We can also normalize reserved characters by percent-encoding them, if they are in the path (with some exceptions that already have special meaning like % , ?, #, and so on; we can however encode {, ", , etc. to normalize these before routing).

We can also encode special characters when registering a route such as internally turning current .route("/100%",..) into .route("/100%25", ...) (%25 is percent encoding for %). Special case of this is that a user can currently write .route("/what?",...) which can never match any request.

With these two changes users can use special characters in route like r#"/"qoutes"/etc"# and match it with both encdoed and not encoded variants.

Alternatives

Using a middleware before axum::Router that decodes (or rather otherwise normalizes) the percent-encoding. Not all percent-decoded paths are valid paths in URI so it would most likely have to percent-decode unreserved characters and percent-encode reserved characters to normalize.

This can be combined with encoding special characters when they are to be registered to matchit (e.g. braces, percent, question mark,...) either by providing something like route_encoded or by having the user encode any needed characters themselves.

I mention this primarily in case we would want to support people who want to have control over percent decoding themselves, but I think this can be also just built inside axum itself.

jplatte commented 2 months ago

I wonder if it would be possible to instead do partial percent decoding before routing - only translating unreserved (pointlessly) percent-encoded characters to their regular character.

Notably, this avoids any issues with % signs that are themselves percent-encoded, because these will only be decoded in the second and last decoding step, as part of path parameter deserialization.

The percent_encoding crate does not offer an API that allows partial decoding, but it should be fairly simple to cherry-pick the relevant things.

W.r.t. what we allow in .route() calls, I haven't thought about it much yet, but I think we can be relatively strict there. I definitely want it to be possible to have non-/ separators in the future. Maybe we should only support reserved characters for that? It would be a little sad to not allow separating by ., but not the end of the world... Would be useful to collect real use-cases.

mladedav commented 2 months ago

We could decode just the unreserved characters, but then I think we should also encode reserved characters (like ?) in route() and similar calls. Otherwise, matching /*path or /path? or /path% is impossible (unless the user encodes the special characters by hand). And the question of how to handle {, ", and } would still stand since these should be reserved but hyper allows them.

Regarding separators inside path segments, I assumed that this would be something that would come from matchit, I know there were some issues about matching based on extensions and such and I believe the change in its grammar was in part to be able to support things like that later. I'm not exactly sure how this is consideration here but it's been some time since I've seen this.

Or we can just decode the unreserved characters and ignore the rest for now of course.

jplatte commented 2 months ago

Maybe just forbid *, ? and % verbatim in route calls? (though would need to check for percent-encoded % as a special case)

Not sure about the rest of your comment, I also don't have the entire context rn.

tokio-rs / axum