Closed keeth closed 8 years ago
A reserved character doesn't automatically mean that it should be encoded. "/" is also a reserved character, but clearly we don't generally encode it, unless it is part of a parameter in the query string. If you read section 3.3 and follow the definitions back, you'll see that "+" is a valid path character.
Ring-Codec provides two functions, url-encode
which encodes for URLs, and form-encode
, which encodes for forms and query strings. url-encode
does not encode "+", but form-encode
does.
I think this would warrant a bit of explanation in the documentation. Both .Net's UrlEncode
and Java's URLEncoder
encode "+" and "/" (and as far as I can see, they don't offer any option not to), so a lot of people would have the expectation that url-encode will do the same.
We can certainly add something to the docstring to explain why other implementations get it wrong ;)
I'm not sure if the API documentation is automatically generated from the docstring but if so, I'd vote for that.
My personal experience was that i googled "clojure url-encode", saw it in the API documentation and assumed it would work the same as java. When my test failed, I googled the problem, found this bug report, and only then realised that form-encode existed.
The API documentation is generated from the docstrings.
This tripped us up (caused a bug in our production code). @weavejester is right that something being reserved doesn't necessarily mean it has to be encoded - it depends on the context but I think, for a general-purpose function, it's dangerous to assume that you don't have to encode some character just because it is valid in certain contexts.
What I think is super-confusing is the term URL encoding in general.
Does url-encode
here mean encoding only the path segment?
Btw. slash seems to be a valid character in the query string: https://stackoverflow.com/a/5691069/1184752
Yes, arguably url-encode
should be changed to path-encode
to make it unambiguous as to its use, and url-encode
should be deprecated.
I bumped into the same confusion. Is it still a plan to disambiguate the functions? Would you accept a PR for that?
Yes, I'll accept a PR to deprecate the url-encode
function in favor of path-encode
.
RFC 3986 defines the plus sign as a reserved character.
url-encode should convert it to %2B