`+` is reserved character

ring-clojure / ring-codec

Utility library for encoding and decoding data

MIT License

63 stars 30 forks source link

`+` is reserved character #16

Closed mcbuddha closed 7 years ago

mcbuddha commented 7 years ago

https://tools.ietf.org/html/rfc3986#section-2.2

weavejester commented 7 years ago

A reserved character doesn't necessarily mean we encode it. ~~"/" is also a reserved character, but we don't encode that in a URL.~~ Sub-delimiters are only significant in certain segments of the URL. We should actually include all sub-delimiters as exceptions. Possibly also add an option to customize the set of untouched characters, though I'm less certain about whether that's a good idea.

mcbuddha commented 7 years ago

I understand that just because + is reserved, it doesn't mean that it should be encoded. However, it is also absent from the "unreserved" list, so making it explicitly unreserved (and thus unencodable) is in conflict with the RFC.

I guess clients could not call url-encode for the parts they don't want to encode (because they know that those are sub-delimiters). Without the PR it's pretty difficult to encode a + sign (if you know that it is not a sub-delimiter, but a query param or whatever else).

weavejester commented 7 years ago

If it's a query parameter, then form-encode should be used. Ring-Codec distinguishes between URL-encoding, and encoding data in the "x-www-form-urlencoded" format. Often the two formats are conflated, but they have different semantics.

In general, if you're encoding a query string or form, then form-encode should be used. If you're encoding something that will sit in the path of the URI, then url-encode should be used.

mcbuddha commented 7 years ago

Got it! Thanks

weavejester commented 7 years ago

As an aside, the current functionality stems from a problem someone had with handling a URL like http://example.com/tags/foo+bar. When the last path segment was decoded with Java's URL decoder, it produced "foo bar". Correct if it was decoding a query string, but not necessarily for a path segment.