mirage / ocaml-uri

RFC3986 URI parsing library for OCaml
Other
97 stars 57 forks source link

Canonicalization of query strings, or pp_hum prints same string but compare/equal says they are different? #128

Open edwintorok opened 5 years ago

edwintorok commented 5 years ago

I got 2 URIs in 2 different files (url1 was created by Uri.to_string on url2), I load them and Uri.compare and Uri.equal considers them distinct even after running it through Uri.canonicalize. This is confusing because when I use Uri.pp_hum it prints identical strings. If I recreate the query parameter of the uri in a my_canonicalize function then they are considered equal.

I found this when trying to put a Uri.t into a Set, is this expected?

#use "topfind";;
#require "uri";;

let () =
  let url1 = Uri.of_string "https://example.com/?redirect=http://example.com/foobar" in
  let url2 = Uri.of_string "https://example.com/?redirect=http%3A%2F%2Fexample.com%2Ffoobar" in
  Format.printf "url1: %a@,url2: %a@." Uri.pp_hum url1 Uri.pp_hum url2;
  Format.printf "compare: %d@." @@ Uri.compare (Uri.canonicalize url1) (Uri.canonicalize url2);
  let my_canonicalize u = let u = Uri.canonicalize u in Uri.with_query u (Uri.query u) in
  Format.printf "compare with my_canonicalize: %d@." @@ Uri.compare (my_canonicalize url1) (my_canonicalize url2);;
url1: https://example.com/?redirect=http://example.com/foobar
url2: https://example.com/?redirect=http://example.com/foobar
compare: 1
compare with my_canonicalize: 0

P.S. dune is awesome for debugging issues like this, I just did opam source uri, added some debug print statements in Uri.compare until I drilled down to where it considered them different.

avsm commented 5 years ago

This definitely seems like a bug to me. Possibly from Uri.Query.compare's behaviour with how it tracks Raw values.