ring-clojure / ring-codec

Utility library for encoding and decoding data
MIT License
63 stars 30 forks source link

url-encode should percent-encode + characters #12

Closed keeth closed 8 years ago

keeth commented 8 years ago

RFC 3986 defines the plus sign as a reserved character.

url-encode should convert it to %2B

weavejester commented 8 years ago

A reserved character doesn't automatically mean that it should be encoded. "/" is also a reserved character, but clearly we don't generally encode it, unless it is part of a parameter in the query string. If you read section 3.3 and follow the definitions back, you'll see that "+" is a valid path character.

Ring-Codec provides two functions, url-encode which encodes for URLs, and form-encode, which encodes for forms and query strings. url-encode does not encode "+", but form-encode does.

timrobinson33 commented 3 years ago

I think this would warrant a bit of explanation in the documentation. Both .Net's UrlEncode and Java's URLEncoder encode "+" and "/" (and as far as I can see, they don't offer any option not to), so a lot of people would have the expectation that url-encode will do the same.

weavejester commented 3 years ago

We can certainly add something to the docstring to explain why other implementations get it wrong ;)

timrobinson33 commented 3 years ago

I'm not sure if the API documentation is automatically generated from the docstring but if so, I'd vote for that.

My personal experience was that i googled "clojure url-encode", saw it in the API documentation and assumed it would work the same as java. When my test failed, I googled the problem, found this bug report, and only then realised that form-encode existed.

weavejester commented 3 years ago

The API documentation is generated from the docstrings.

jumarko commented 3 years ago

This tripped us up (caused a bug in our production code). @weavejester is right that something being reserved doesn't necessarily mean it has to be encoded - it depends on the context but I think, for a general-purpose function, it's dangerous to assume that you don't have to encode some character just because it is valid in certain contexts.

What I think is super-confusing is the term URL encoding in general. Does url-encode here mean encoding only the path segment?

jumarko commented 3 years ago

Btw. slash seems to be a valid character in the query string: https://stackoverflow.com/a/5691069/1184752

weavejester commented 3 years ago

Yes, arguably url-encode should be changed to path-encode to make it unambiguous as to its use, and url-encode should be deprecated.

devurandom commented 7 months ago

I bumped into the same confusion. Is it still a plan to disambiguate the functions? Would you accept a PR for that?

weavejester commented 7 months ago

Yes, I'll accept a PR to deprecate the url-encode function in favor of path-encode.