Closed jphager2 closed 6 years ago
because in order to safely replace invalid UTF8, we need to first split the cookie header
What happens if you don't?
Can you show me the Rack code that parses the cookie?
Basically the issue is that bad UTF8 can get through the sanitizer if it is url encoded in the cookie. The sanitizer won't catch it, and the application might fail if it does something with that input.
Here is where the cookies are parsed in rack: https://github.com/rack/rack/blob/master/lib/rack/utils.rb#L215
The ';,'
corresponds to the matching regexp here: https://github.com/rack/rack/blob/master/lib/rack/query_parser.rb#L6
Sorry, I wasn't clear. What if you don't split the cookie header?
Ah yeah. For that there is two things:
So because of 1), it makes sense to split before decoding/sanitizing and then join back. We could also decode, sanitize and then not reencode, but then you risk breaking the cookie delimitation if there is an encoded delimiter in one of the cookie values.
Thanks for explanation. It's worth putting this in a comment.
Ok cool, will do (probably Monday).
@whitequark Comment added. Does it look good now?
Thanks!
Cookies can be url encoded. Since they are not included in the list of URI_FIELDS, url encoded invalid UTF8 in cookies are not sanitized. However, just adding HTTP_COOKIE to URI_FIELDS won't work, because in order to safely replace invalid UTF8, we need to first split the cookie header. The regular expression used here is the same used by rack when parsing the cookie.
@whitequark, please let me know you thoughts about this addition.