ring-clojure / ring

Clojure HTTP server abstraction
MIT License
3.77k stars 520 forks source link

Charset detection doesn't comply with RFC which breaks things like query param parsing #402

Closed vincentjames501 closed 4 years ago

vincentjames501 commented 4 years ago

According to the RFC https://tools.ietf.org/html/rfc7231#section-3.1.1

A parameter value that matches the token production can be transmitted either as a token or within a quoted-string. The quoted and unquoted values are equivalent. For example, the following examples are all equivalent, but the first is preferred for consistency: text/html;charset=utf-8 text/html;charset=UTF-8 Text/HTML;Charset="utf-8" text/html; charset="utf-8"

If you pass in text/html; charset="utf-8" for example, ring.util.request/character-encoding parses the content type as "\"utf-8\"" instead of just "utf-8" which downstream causes ring.util.codec to parse things like query params as nil:

(character-encoding {:headers {"content-type" "content-type: text/meh; charset=\"utf-8\""}})
=> "\"utf-8\""
(character-encoding {:headers {"content-type" "content-type: text/meh; charset=utf-8"}})
=> "utf-8"
(form-decode-str "blah" "\"utf-8\"")
=> nil
(form-decode-str "blah" "utf-8")
=> "blah"

Seems like we could just adjust charset-pattern/re-value to omit quotes.

weavejester commented 4 years ago

Thanks for the report. I'll see if I can adjust the regex today. I believe quoted fields in headers should also be able to support \", if memory serves.