mrkkrp / modern-uri

Modern library for working with URIs
Other
68 stars 18 forks source link

Relative URI resolution #6

Closed mjepronk closed 6 years ago

mjepronk commented 6 years ago

Does this library implement relative URI resolution as defined in section 5.2 of RFC 3986? If not, would you accept a pull request to do this?

mrkkrp commented 6 years ago

This is a good catch, it lacks that functionality.

would you accept a pull request to do this?

Certainly!

mjepronk commented 6 years ago

OK, I have a question regarding the representation of a parsed URI. It seems that it's impossible right now to tell based on the URI data if a URL ended with a trailing slash or not. For example:

λ> mkURI "/foo/bar"
URI {uriScheme = Nothing, uriAuthority = Left True, uriPath = ["foo","bar"], uriQuery = [], uriFragment = Nothing}
λ> mkURI "/foo/bar/"
URI {uriScheme = Nothing, uriAuthority = Left True, uriPath = ["foo","bar"], uriQuery = [], uriFragment = Nothing}

This is necessary to correctly handle the merge paths operation as defined in section 5.2.3 of RFC 3986.

My suggestion would be to add a trailing empty string to the uriPath list if the path ended with a slash. But of course this may give issues with existing code not expecting it. Or do we need to change the URI data type?

mrkkrp commented 6 years ago

We must represent /foo/bar/ as a URI with uriPath equal to ["foo","bar",""]. The current behavior is a consequence of my desire to collapse consecutive slashes automatically, I'd still prefer to retain that behavior except in the case when the empty path piece occurs at the end.

The code of interest is this (the filter should not remove empty path piece if it's the last piece in a path):

https://github.com/mrkkrp/modern-uri/blob/master/Text/URI/Parser/Text.hs#L105

Similarly for ByteString:

https://github.com/mrkkrp/modern-uri/blob/master/Text/URI/Parser/ByteString.hs#L152

mjepronk commented 6 years ago

OK, great. I'll make the change.

mjepronk commented 6 years ago

It's weird, when I change filter on line 105 of Text.URI.Parser.Text with my function filterButLast below, I get a parse error on URL's that end with a '/'. Do you have any idea why?

filterButLast :: (a -> Bool) -> [a] -> [a]
filterButLast _ [] = []
filterButLast _ [x] = [x]
filterButLast p (x : xs)
  | p x       = x : filterButLast p xs
  | otherwise = filterButLast p xs
mjepronk commented 6 years ago

Parse error is because mkPathPiece does not accept empty string:

λ> mkPathPiece ""
*** Exception: RTextException PathPiece ""

Should we modify this function?

mrkkrp commented 6 years ago

OK, sorry, I have advised you wrong.

The mkPathPiece smart constructor imposes the invariant that path pieces cannot be empty, thus we automatically avoid consecutive slashes, the behavior of the parser is just a consequence of this (there is no other way for it to function without breaking the invariant, as you have just discovered).

The presence of the trailing slash is obviously must be encoded in the URI data type. The first things that comes to mind is to add a new field to the record, say, uriPathTrailingSlash :: Bool, but it's not good, because trailing slash only makes sense when the path is not empty, i.e. it contains at least one path piece. Otherwise the trailing slash will be the same as leading slash that determines whether the path is absolute or relative, and that's another story.

So the proper solution is a breaking change in this spirit:

data URI = URI
  { ...
  , uriPath :: Maybe (Bool, NonEmpty PathPiece)
  , ...
  }

Here, Nothing would represent the empty path. When path is not empty we have Just and only in that case we have the chance to specify the Bool which tells us whether we have a trailing slash or not. I think it's best to keep NonEmpty PathPiece in the functorial position (of vanilla covariant functor, Functor), so it's easier to manipulate with fmap, so the Bool should go first in the tuple.

Then, once we have this, we should still provide normalized path (as a [] list) with lenses and a new 0-1 traversal which would allow to manipulate the trailing slash component.

Docs, tests, etc. this would take some time. If you feel like you're up to this refactoring, you're welcome to try. Otherwise, I'm busy with other stuff right now, but I will have the time for this in the beginning of January, so after the refactoring you could return to the URI resolution problem.

What do you think?

mjepronk commented 6 years ago

Thank you for the explanation. I agree with you that this is the best route to take. I'll try to implement this change and see if it works for all cases.

mrkkrp commented 6 years ago

Resolved in 0214160bee93715007356ca649c96bfbfe273a6e.