haskell / network-uri

URI manipulation facilities
24 stars 33 forks source link

Is `isUnescapedInURI` written incorrectly? #78

Open mitchellwrosen opened 9 months ago

mitchellwrosen commented 9 months ago

isUnescapedInURI documentation says:

Returns True if the character is allowed unescaped in a URI.

However, its implementation is:

isUnescapedInURI c = isReserved c || isUnreserved c

where isReserved documentation says:

Returns True if the character is a "reserved" character in a URI. To include a literal instance of one of these characters in a component of a URI, it must be escaped.

So, it seems to me that if isReserved returns True, then isUnescapedInURI ought to return False.

ezrakilty commented 9 months ago

This is, essentially, a documentation bug.

The isUnescapedInURI function is answering the question, "Can this appear at all in a URI?" The docstring for it gives an example URI containing non-ASCII characters (with umlauts and such), and those absolutely have to be escaped before they can be included.

The reserved characters like ? are allowed to appear in a finished URI, so both functions return True. But the docstring for isReserved is trying to put you on the right path: If you're forming a URI out of parts, and one of those parts contains a reserved character, you'd better escape it.

In fact, the companion function for isUnescapedInURI, namely isUnescapedInURIComponent is going to be the more useful one: If you are forming a URI out of parts and including arbitrary strings, you should use that one to escape the parts. In fact, I'm not sure what you would use isUnescapedInURI for.

I'll have a go at improving the docstrings for the isUnescaped functions.

ezrakilty commented 9 months ago

Well, after playing it with it for a bit more, I realized isUnescapedInURIComponent is rarely what you want, either. It will encode, say, a slash character, which is rarely what you want when forming a path, say:

>>> URI {
>>>    uriScheme = "http:",
>>>    uriAuthority = Nothing,
>>>    uriPath = escapeURIString isUnescapedInURIComponent "/foo/b?ar/baz",  -- you want the question mark escaped
>>>    uriQuery = "",
>>>    uriFragment  = ""
>>> }

The result escapes the question mark as desired, but also the slashes, which would not mess up the parsing at that point and you'd usually keep them unescaped.

I will still try to improve the documentation, although I'll be doing some gymnastics to try to make either of these functions sound useful...