mirage / ocaml-uri

RFC3986 URI parsing library for OCaml
Other
98 stars 57 forks source link

path depends on whether there is a host or not #72

Closed Drup closed 9 years ago

Drup commented 9 years ago
# Uri.path @@ Uri.make ~host:"bla.com" ~path:"" () ;;
- : bytes = ""
# Uri.path @@ Uri.make ~host:"bla.com" ~path:"/" () ;;
- : bytes = "/"
# Uri.path @@ Uri.make ~host:"bla.com" ~path:"foo/" () ;;
- : bytes = "/foo/"
# Uri.path @@ Uri.make ~path:"" () ;;
- : bytes = ""
# Uri.path @@ Uri.make ~path:"/" () ;;
- : bytes = "/" 
# Uri.path @@ Uri.make ~path:"foo/" () ;;
- : bytes = "foo/"

I guess it's expected, but it's very annoying, since it forces to reimplement some path handling out of uri. I suppose it's related to https://github.com/mirage/ocaml-uri/issues/53

dsheets commented 9 years ago

When does this bite you?

Uri.t, for better or worse, represents both "absolute URIs" and "URI references" (relative URIs including absolute paths without hosts and relative paths). This behavior is required to handle both in the same type while also avoiding exceptions and reducing the interface surface.

Perhaps these types of URIs should be separated in the next version.

dsheets commented 9 years ago

There is another option I forgot to mention: store path as relative but absolutize it when serializing with to_string. Perhaps this solves your use case?

Drup commented 9 years ago

Well, it forces me to do segmentation by / with some handling of "does it starts or end with /, if yes, do this special thing". If uri exported a string list of the path components, I would not have to.

Drup commented 9 years ago

And the string list would respect the property "No string is empty except potentially the last one".

dsheets commented 9 years ago

So, the path segment interface probably has to be richer than that to support relative and absolute paths and paths with multiple consecutive slashes.

Could you be more specific about what you are actually doing with these paths that forces you to "do this special thing"?

Drup commented 9 years ago

paths with multiple consecutive slashes.

I didn't even knew it was allowed.

I was simply trying to map the semantic of routes (or rather, services) as we define them in eliom to uris.

I'm not sure to see why it would need to be more complicated. relative and absolute paths should only be relevant to the host parameter (present or absent), why should this information leak to the path part ?

dsheets commented 9 years ago

http://example.net with_path [foo/bar]() = http://example.net/foo/bar path [/foo/bar]()

Using a relative path against a URI-with-host results in a resolution of the path against / implicitly.

Could you please tell me the details of the situation in which this is causing you trouble? I'd really like to improve the path API in the next version but I'm not totally clear on your use and requirements and the reasons for them.

Drup commented 9 years ago

There are no troubles, just boilerplate that should be handled in uri.

The path exposed by uri should not change if the uri is absolute or not. If some next layer wants to handle relative and absolute uris in different ways, it should look at the host., hence the path should be the same regardless. The next layer handling a uri can then ignore the fact that it's absolute/relative if it wants. What are the reasons for the path to be different ?

Maybe (probably) I'm missing some uri corner cases.

dsheets commented 9 years ago

Yes, please, tell me why you have the boilerplate and what is it doing? I don't understand the circumstance that leads you to have your initial reported issue.

I'm not sure what you mean by "what are the reasons for the path to be different?" If you have a URI that would be absolute and you attempt to give it a path that is relative or create it with a path that is relative then the only serialization for this URI is with an absolute path. To project out the path component and have it be relative is representation that cannot be honored by the serialization of the data type (and a round trip would make it absolute). This is why perhaps distinguishing relative and absolute URIs at the type level may be a good idea.

Please elaborate on the specifics of your use case so I can tell if it matches other issues related to path handling or if it is a new issue.

Drup commented 9 years ago

It transforms the path as exported by uri in a string list respecting the property I gave earlier (no empty string, except potentially the last one) and pass it to the routing part. The routing part doesn't care about absolute or relative uris, just care about the path components (and should not have to handle the potentially empty string at the beginning).

My goal was to replace this thing by something simpler.

So for now, test first char for / and string split on /, instead of just splitting. I just did not realized it at first and the debugging was ... annoying.

dsheets commented 9 years ago

Where do you get these 'maybe absolute or maybe relative' URIs? HTTP 1.1 says only absolute URIs should be used in the first line of a request. In general, I recommend doing something like Uri.(path (resolve "http" (of_string "/") input_uri)) before further processing to take care of ., .., and the relative vs absolute path issue.

Drup commented 9 years ago

I recommend doing something like Uri.(path (resolve "http" (of_string "/") input_uri)) before further processing

I was probably missing this bit ...