Closed duairc closed 6 years ago
Where is this defined? I can't find any mention of special treating of '+'
in RFC 3986:
https://tools.ietf.org/html/rfc3986#section-3.4
I'm aware of this feature, but I just want to understand if it's a commonly used convention or something imposed by a document. If the first, the transformation should probably be handled by the user of the library, not the library itself.
You're right, it's not defined in RFC 3986 as far as I know, so technically it's not a URI thing. It is defined in RFC 1866 (HTML 2.0), in paragraph 8.2.1. Note that it only affects query strings: + in the path should be interpreted literally.
There was also a draft RFC describing the application/www-form-urlencoded format but it expired and doesn't seem to have been replaced by anything.
For what it's worth though, this isn't just an aesthetic thing. At the moment I'm writing an implementation of SAML in Haskell (that has to talk to another service written in PHP). SAML basically works by passing base64-encoded, deflated XML documents around in the query string. Base64 produces +
characters. I was using modern-uri
to build these URLs with the SAML requests in the query string. The other end (written in PHP) kept choking on the input, because it was interpreting the +
signs in the Base64 data as `, and it took me a long time to figure out what was happening (whitespace doesn't automatically make base64 encoded data invalid, so it was the
inflate` step that was failing on the PHP side).
You're right that I could just do the transformation myself, but there isn't an obvious way of doing this without parsing the whole URL string again myself. Replacing +
with %2B
is fine, but this should only be done after the ?
sign. It seems silly to render the URI
as a string, and then parse that string again looking for a ?
, just to apply a transformation thereafter.
Basically I'm saying, I understand if you don't want to make it the default behaviour, but it really makes sense for this functionality to exist in modern-uri
somewhere, I don't think it should be something users should have to do.
OK, I see. I think the only reasonable solution is to create a record of settings (there may be more of them in the future, who knows) where we'll put a setting controlling whether we should encode/decode space as the plus sign +
as per RFC 1866. We then must pass this record, let's call it SerializationSettings
maybe, to both renders and parsers.
New tests and a changelog update are also necessary... This will change the API, so it must be a new major version, 0.2 (hopefully the library is still new and it won't break that much code).
If you like, you can try to do this, otherwise I'll do it myself pretty soon.
Thanks for opening the issue!
That sounds reasonable. It also sounds like at least a few hours of work though, and I can't see myself getting the time to do that this week or next.
However, I just had a little look at RFC 3986 again — I don't actually see any mention of special handling of &
and =
in query strings either. It doesn't seem to define any concept of "query parameters", it just seems to say that everything after the ?
is the "query string". As far as I can tell, the concept of query parameters comes from the RFC 1866 that I mentioned. GIven that modern-uri
seems to have the concept of query parameters, as far as I can tell, the correct way to handle them is always to use application/x-www-form-urlencoded
(i.e., +
is
) encoding.
Perhaps then the uriQuery
field should actually contain a sum type like:
data Query = QueryString Text | QueryParams [QueryParam]
Parsing would potentially be ambiguous though, so I think you'd still need to pass an option to the parser, but rendering would be unambiguous.
Given that
modern-uri
seems to have the concept of query parameters, as far as I can tell, the correct way to handle them is always to useapplication/x-www-form-urlencoded
(i.e.,+
is) encoding.
Good point. Now I'm inclined to just merge this PR. I need to think about this more first. I'll return to the discussion tomorrow perhaps.
I'm merging this, I'll add some more tests, a changelog note, and other stuff myself. modern-uri-0.1.2.0
with the fix will be on Hackage today.
Thank you for pointing out the problem!
Special handling of + is needed in query strings — in rendering, '+' must be percent-encoded, and any '+' encountered in parsing must be parsed as ' '.
Ideally ' ' would be rendered as '+' as well, but this would need changes in several places, and '%20' isn't wrong, so I haven't attempted to implement that.