mirage / ocaml-uri

RFC3986 URI parsing library for OCaml
Other
97 stars 57 forks source link

Uri.path broken ? #167

Open cuihtlauac opened 1 year ago

cuihtlauac commented 1 year ago

I believe the behaviour of Uri.path is wrong when it is passed a Uri beginning with exactly two slashes. In such case, it drops everything before the third slash

#require "uri-sexp";;
utop # "aaa/bbb" |> Uri.of_string |> Uri.path;;
- : string = "aaa/bbb"
utop # "/aaa/bbb" |> Uri.of_string |> Uri.path;;
- : string = "/aaa/bbb"
utop # "//aaa/bbb" |> Uri.of_string |> Uri.path;; (* Boom! *)
- : string = "/bbb"
utop # "///aaa/bbb" |> Uri.of_string |> Uri.path;;
- : string = "/aaa/bbb"
reynir commented 1 year ago

When the URL starts with double slash it's a Protocol-relative url. In your example, aaa would be the authority and /bbb would be the path.

cuihtlauac commented 1 year ago

@reynir : That means Dream.split_target is wrong:

let split_target string =
  let uri = Uri.of_string string in
  let query =
    match Uri.verbatim_query uri with
    | Some query -> query
    | None -> ""
  in
  Uri.path uri, query

It shouldn't call Uri.path

https://github.com/aantron/dream/blob/master/src/pure/formats.ml#L168

reynir commented 1 year ago

I think calling Uri.path is alright, but the issue lies at Uri.of_string string I think. My first idea was to use Uri.make ~path:string (), but string is path and query. I don't find a function to split path and query.

I think I found a bug, though:

# Uri.make ~path:"//aaa/bbb" () |> Uri.to_string;;
- : string = "//aaa/bbb"

(I'm not sure you can actually represent such a URL)

cuihtlauac commented 1 year ago

The doc says:

(** Parse a URI string literal into a URI structure. A bare string will be
    interpreted as a path; a string prefixed with `//` will be interpreted as a
    host.
*)
val of_string : string -> t

Maybe, interpretation as a host should be optional?

aantron commented 1 year ago

The issue is that we know from the context, in Dream, that we are dealing with only a path and query string and there definitely isn't a hostname present.

In other words, it's possible for a URI like //hostname//path/that/begins/with/double/slash?query to be provided, i.e. both a hostname and a path that begins with a double slash is present. We know from the context that the //hostname has already been stripped and we would like to treat //path... as path-and-query and still get useful behavior out of ocaml-uri and rely on the work that has gone into ocaml-uri for handling URIs correctly. However, we don't see a way in the interface of ocaml-uri of doing that. The only way to construct a Uri.t from a string is Uri.of_string, and this function isn't sensitive to the context.

let () =
  let uri = Uri.of_string "//hostname//path/with/double/slashes?query&more" in
  print_endline (Uri.path uri);
  print_endline (Uri.verbatim_query uri |> Option.get)

let () =
  (* Starting from a partial URI where the hostname has already been stripped. *)
  let uri = Uri.of_string "//path/with/double/slashes?query&more" in
  print_endline (Uri.path uri);
  print_endline (Uri.verbatim_query uri |> Option.get)

Output:

//path/with/double/slashes
query&more
/with/double/slashes
query&more
aantron commented 1 year ago

We have a test case in Dream that observes this (see https://github.com/aantron/dream/issues/248#issuecomment-1509547446).