lambdaisland / uri

A pure Clojure/ClojureScript URI library
Mozilla Public License 2.0
243 stars 21 forks source link

Query string construction is broken #54

Closed cch1 closed 4 months ago

cch1 commented 4 months ago

Surprisingly for a library dedicated to URI parsing and construction, query string construction does not properly encode its arguments*:

(map->query-string {:redirect_uri "https://localhost:3000/login"})
=> "redirect_uri=https%3A//localhost%3A3000/login"

The forward slashes aren't encoded. The encoding by ring.util.codec is:

(codec/form-encode {:redirect_uri "https://localhost:3000/login"})
=> "redirect_uri=https%3A%2F%2Flocalhost%3A3000%2Flogin"

This seems like a show-stopper of a bug for URI construction. I suspect there are multiple issues around proper encoding of the components of a URI since #53 hints at such problems for the user:password components.

*Postscript: It seems the requirement to encode forward slashes is ... unclear. But since the ubiquitous ring lib is encoding them, I was expecting this lib to encode them as well. If the lack of encoding of the forward slashes is a principled decision, can it be documented?

plexus commented 4 months ago

I see you've already figured out that your accusatory remarks have no basis. It is always best to find an authoritative source before making wild claims. Pointing at an unrelated issue to add weight to your argument is sloppy argumentation.

The RFC is quite clear

RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax, Section 3.4. Query

  query       = *( pchar / "/" / "?" )

The characters slash ("/") and question mark ("?") may represent data within the query component. ... it is sometimes better for usability to avoid percent-encoding those characters.

Any character that in a given URI component is not recognized as a delimiter may or may not be percent encoded, but the encoding in such cases is superfluous. Since ring.util.codec deviates from the spec perhaps that behavior should be documented.

The fact is that query-string encoding is more a matter of convention than of black-and-white spec. Which is why the base URI parsing/unparsing behavior does not change any encoding, because given a query string we can't know which characters in the "sub-delims" set are used as delimiters, and which are used as values. We provide query-encode and friends (like map->query-string) separately to construct encoded query strings, which provide the most common convention (key=value pairs separated by &, with + for spaces). However other conventions exist and are equally "correct", like using ; instead of &.

cch1 commented 4 months ago

Thanks for the detailed explanation, @plexus . You are right that my assumption of a bug was wrong.