Open kirelagin opened 3 years ago
I have to admit that the first part of my text ignores the elephant in the room: RFC 3986 was obsolete by WHATWG URL and the latter very explicitly prohibits [
and ]
in the query component.
But the second part still stands and, I imagine, there are plenty of people who are not aware of the fact that their square brackets are now definitely illegal :/.
One alternative would be to redefine “modern URI” to mean “modern URL according to WHATWG”. Then disallowing the square brackets is the right behaviour indeed, but I am not sure if this library is actually in compliance. E.g. the parsing algorithms is almost certainly not. Here is the basic URL parser specification for reference.
What do other packages do? Let's say network-uri
and uri-bytestring
?
I believe, network-uri
implements the RFC naively (i.e. fails on []
).
uri-bytestring
is kinda weird in that it only parses &
-separated key=value
pairs, BUT it uses a customisable predicate for testing the validity of characters and ships two predefined tests, with laxURIParserOptions
allowing unencoded brackets in queries.
(Here is the commit that implements this: https://github.com/Soostone/uri-bytestring/commit/b89303219680170d51a02bd89c11ddae98562222.)
I realise this issue is a bit cumbersome, since I was doing the research as I was writing it and changed my opinion on the matter a couple of times in the process.
Currently, I think there are two independent actionable items:
[]
.[]
, e.g. a function that returns m (Either Url Url)
so that the users have a way of supporting legacy inputs with unencoded brackets but, at the same time, are forced to explicitly admit that this is happening and maybe warn their users.
This is a follow up to mrkkrp/req#102. I am going to try to convince you that
modern-uri
implements RFC 3986 too naively.I will base my case on the two main arguments:
The specification
The Reserved Characters section lists
[
and]
asgen-delims
, which is part ofreserved
. Then it says:This specification does not explicitly state that it is using RFC 2119 keywords, so it is not entirely clear to me how to interpret this “should”, but, I believe, the best option we have is to interpret it as per RFC 2119, and thus web-apps not escaping square brackets are likely in compliance. This alone should be enough to allow them.
It then goes on:
Now this “no delimiting role is known” part is pretty mysterious. The paragraph right above the one I have been quoting seems to be an attempt at explaining what this means, but I am having a hard time understanding it.
If we go now to the Query section, it reads:
which suggest that anything that is not a
#
is not a delimiter or, at least, the delimiting role of[
and]
is unknown (to me 🙂). This part of text may be non-normative and it contradicts the ABNF right below it, but the specification never explains what is normative and what is informative. The intent seems to be to disallow the square brackets, but it is not that obvious, in my opinion.Given how ambiguous all this is, I am not convinced that implementing exactly the ABNFs from the specification is enough.
The real world
The discussion in the
req
issue was based around the handling of square brackets in real-world browsers. A couple of years ago there was a discussion about this in the Firefox issue tracker and there is a fantastic survey of behaviours of various browsers.They surveyed specifically what the browsers were sending to the server, while the discussion on
req
was about “my browser converts” – I am not sure what exactly you were testing. Note that the actual URL location in that issue was already encoded (likely, by GitHub’s markdown processor): the HTML code of the link was<a href="https://gitlab.com/morley-framework/morley/-/issues/new?issue%5Btitle%5D=Indigo%20website:%20" rel="nofollow">https://gitlab.com/morley-framework/morley/-/issues/new?issue[title]=Indigo%20website:%20</a>
. And even when I follow this encoded link in Firefox, it decodes the brackets. This is a result of that survey I linked – as you can see, Firefox was the only browser forcefully encoding square brackets, so this was considered a bug and was changed.I can’t make a hyperlink with unencoded brackets here on GitHub, but if you paste
https://gitlab.com/morley-framework/morley/-/issues/new?issue[title]=Indigo%20website:%20
into the address bar of your Google Chrome it will (I hope – at least that is what I see testing in Chromium) not encode the brackets and send them as is.Thus, we can see that, regardless of what the specification has to say, in practice the square brackets are thought to be allowed in the query component and not allowing them was fixed as a bug in a major web-browser.