Closed mjepronk closed 6 years ago
This is a good catch, it lacks that functionality.
would you accept a pull request to do this?
Certainly!
OK, I have a question regarding the representation of a parsed URI. It seems that it's impossible right now to tell based on the URI
data if a URL ended with a trailing slash or not. For example:
λ> mkURI "/foo/bar"
URI {uriScheme = Nothing, uriAuthority = Left True, uriPath = ["foo","bar"], uriQuery = [], uriFragment = Nothing}
λ> mkURI "/foo/bar/"
URI {uriScheme = Nothing, uriAuthority = Left True, uriPath = ["foo","bar"], uriQuery = [], uriFragment = Nothing}
This is necessary to correctly handle the merge paths operation as defined in section 5.2.3 of RFC 3986.
My suggestion would be to add a trailing empty string to the uriPath
list if the path ended with a slash. But of course this may give issues with existing code not expecting it. Or do we need to change the URI
data type?
We must represent /foo/bar/
as a URI with uriPath
equal to ["foo","bar",""]
. The current behavior is a consequence of my desire to collapse consecutive slashes automatically, I'd still prefer to retain that behavior except in the case when the empty path piece occurs at the end.
The code of interest is this (the filter should not remove empty path piece if it's the last piece in a path):
https://github.com/mrkkrp/modern-uri/blob/master/Text/URI/Parser/Text.hs#L105
Similarly for ByteString
:
https://github.com/mrkkrp/modern-uri/blob/master/Text/URI/Parser/ByteString.hs#L152
OK, great. I'll make the change.
It's weird, when I change filter
on line 105 of Text.URI.Parser.Text with my function filterButLast
below, I get a parse error on URL's that end with a '/'. Do you have any idea why?
filterButLast :: (a -> Bool) -> [a] -> [a]
filterButLast _ [] = []
filterButLast _ [x] = [x]
filterButLast p (x : xs)
| p x = x : filterButLast p xs
| otherwise = filterButLast p xs
Parse error is because mkPathPiece
does not accept empty string:
λ> mkPathPiece ""
*** Exception: RTextException PathPiece ""
Should we modify this function?
OK, sorry, I have advised you wrong.
The mkPathPiece
smart constructor imposes the invariant that path pieces cannot be empty, thus we automatically avoid consecutive slashes, the behavior of the parser is just a consequence of this (there is no other way for it to function without breaking the invariant, as you have just discovered).
The presence of the trailing slash is obviously must be encoded in the URI
data type. The first things that comes to mind is to add a new field to the record, say, uriPathTrailingSlash :: Bool
, but it's not good, because trailing slash only makes sense when the path is not empty, i.e. it contains at least one path piece. Otherwise the trailing slash will be the same as leading slash that determines whether the path is absolute or relative, and that's another story.
So the proper solution is a breaking change in this spirit:
data URI = URI
{ ...
, uriPath :: Maybe (Bool, NonEmpty PathPiece)
, ...
}
Here, Nothing
would represent the empty path. When path is not empty we have Just
and only in that case we have the chance to specify the Bool
which tells us whether we have a trailing slash or not. I think it's best to keep NonEmpty PathPiece
in the functorial position (of vanilla covariant functor, Functor
), so it's easier to manipulate with fmap
, so the Bool
should go first in the tuple.
Then, once we have this, we should still provide normalized path (as a []
list) with lenses and a new 0-1 traversal which would allow to manipulate the trailing slash component.
Docs, tests, etc. this would take some time. If you feel like you're up to this refactoring, you're welcome to try. Otherwise, I'm busy with other stuff right now, but I will have the time for this in the beginning of January, so after the refactoring you could return to the URI resolution problem.
What do you think?
Thank you for the explanation. I agree with you that this is the best route to take. I'll try to implement this change and see if it works for all cases.
Resolved in 0214160bee93715007356ca649c96bfbfe273a6e.
Does this library implement relative URI resolution as defined in section 5.2 of RFC 3986? If not, would you accept a pull request to do this?