Closed mjambon closed 10 years ago
It's late and everything but there also seems to be a recent problem with spaces:
# Uri.path (Uri.of_string "/a%20b");;
- : string = "/a b"
I'll follow up tomorrow.
The spaces problem is not a problem. Uri.path
returns a human string representation of the component. The slashes problem is a problem and we will have to maintain path components to solve it.
I agree with @dsheet's triage -- do you really want the space representation preserved as well?
-anil
On 6 Feb 2014, at 10:43, David Sheets notifications@github.com wrote:
The spaces problem is not a problem. Uri.path returns a human string representation of the component. The slashes problem is a problem and we will have to maintain path components to solve it.
— Reply to this email directly or view it on GitHub.
What we need for our application is extract the path as a string list
from a Uri.t
. For instance we want to obtain [""; "foo/bar"; "baz"]
from "/foo%2Fbar/baz"
.
Cohttp client uses Uri.path_and_query
which like Uri.path
produces an incorrect path in the HTTP request. File request.ml
:
let write_header req oc =
let fst_line = Printf.sprintf "%s %s %s\r\n" (Code.string_of_method req.meth)
(Uri.path_and_query req.uri) (Code.string_of_version req.version) in
Interestingly, nginx copes with spaces pretty well except when a space is followed by the letter 'H' (as in HTTP), which causes the request to fail. Our problematic URL was http://localhost/something/Palo%20Alto%20City%20Hall
:-)
In summary, I would prefer to have pairs of functions that read and write standard-compliant URL components by default, plus some extras for pretty-printing if that's something people want. Assuming backward compatibility is not an issue, I'm suggesting an interface along those lines:
type url_compatible = string
(* string that may occur as a substring of a valid URL *)
type plain = string
(* raw sequence of bytes *)
val of_string : url_compatible -> t
val to_string : t -> url_compatible
(* returns a URL, where special characters including slash (`/`)
within path segments are properly escaped (`%2F`).
to_string (of_string "/a%2Fb") = "/a%2Fb"
*)
val segments_of_path : url_compatible -> plain list
val path_of_segments : plain list -> url_compatible
(*
segments_of_path "/a%2Fb/c" = [""; "a/b"; "c"]
path_of_segments [""; "a/b"; "c"] = "/a%2Fb/c"
*)
val path : t -> url_compatible
(* path (of_string "http://example.com/a%2Fb") = "/a%2Fb" *)
val path_segments : t -> plain list
(*
path_segments (of_string "a/b") = ["a"; "b"]
path_segments (of_string "/a/b") = [""; "a"; "b"]
path_segments (of_string "/a%2Fb/c") = [""; "a/b"; "c"]
*)
val path_pretty : t -> string
(*
Either:
path_pretty (of_string "/a%2Fb/The%20Hulk") = "/a%2Fb/The Hulk"
or perhaps the irreversible:
path_pretty (of_string "/a%2Fb/The%20Hulk") = "/a/b/The Hulk"
(this is the current Uri.path implementation)
*)
These are actually 2 related bugs:
path_and_query
was incorrectly implemented and does not properly encode the path component.I'll try to work on a patch for this tomorrow. It shouldn't be too involved. See the failing tests.
And yes, the interface needs massive clean-up, compat breaking changes, and support for a number of other standards including RFC 3987, RFC 6570, and recursive fragment identifiers described by Errata 3330.
Added another bug that was marked with the same known-error as the path delimiter bug this time with colons in userinfo components.
I hope to get a 2.0 release out sometime this spring. Thanks for your patience and your great report!
Result should be:
It seems that some functions treat the path as a whole and encode/decode the whole path instead of each component. As a result, simply marking
/
as unsafe doesn't solve the problem.I will let you know if I start working on a fix.