mirage / ocaml-uri

RFC3986 URI parsing library for OCaml
Other
97 stars 57 forks source link

Uri with github flavored SSH URIs #154

Closed NathanReb closed 3 years ago

NathanReb commented 3 years ago

I don't know much about RFC3986 but from a quick look, it appears to me that github is abusing the spec and uses the port for the repository owner. I.e. according to the specification, ssh URIs are roughly user@host:port/path and Github SSH URIs are as follows: git@github.com:owner/repo.git. Problem is that the port should be composed of digits only and that's not the case of github users and organisations names.

I didn't know about this before and was trying to use uri to parse URIs in dune-release rather than relying on our own parsers and that abuse of the port part is preventing me from doing so. Uri.of_string behaviour is a bit surprising here though: it succeeds but I'm then unable to get that information back from the Uri.t it returned since port is defined as:

val port : t -> int option

and therefore returns None in my case.

Shouldn't Uri.from_string fail in such cases? If so then I'd understand that you would not want to provide a way to parse such URIs as it doesn't follow the RFC even though I'd love it to be able to use some unsafe API in uri to do so!

Julow commented 3 years ago

The user@host:path syntax is not RFC3986: https://git-scm.com/docs/git-clone#_git_urls They call it "alternative scp-like syntax". Git also understands ssh:// followed by a standard URI.

The uri library handles this syntax:

# Uri.path (Uri.of_string "user@host:50/path");;
- : string = "50/path"
# Uri.path (Uri.of_string "ssh://user@host:50/path");;
- : string = "/path"
dinosaure commented 3 years ago

git@github.com is the scheme for Uri.t (however, it should not be when @ is not an authorized character according the RFC). However, such value can be considered as valid when the authority can have an @ character to refer to an userinfo. But, yes, the double-dot should introduce a port (which can only be a suit of digits). However, if we still follow the ABNF, a possible path-nonscheme can exists - and, in this way, the given data is valid (and owner can be valid but not as a port).

According to the RFC, git@github.com:owner/repo.git should be valid but not on your way. More concretely, Uri.t wants to follow the Postel law where you are liberal about inputs and conservative about outputs. Due to the large usage of Uri.t, we should continue to accept anything - even if it's wrong.

The @Julow's way is valid if you want to use SSH with Uri.t (and we used it for a long time on the ocaml-git side - minor meta-information such as the private key).

Julow commented 3 years ago

You are right, uri doesn't like the "scp-like" syntax:

# Uri.scheme (Uri.of_string "user@host:50/path");;
- : string option = Some "user@host"
NathanReb commented 3 years ago

I was aware that Uri doesn't handle the scp-like syntax and I was already appending a scheme when there was none.

The resulting Uri.t, be it with or without the scheme is broken so I'm a bit surprised you're willing to maintain that behaviour but that's your call!

On my end I unfortunately can't use Uri. Feel free to close this issue!

NathanReb commented 3 years ago

One note I'd like to add is that this choice of behaviour can cause those malformed URIs to be interpreted like proper ones. The None returned by port can mean two things: there's no explicitly specified port or there's one but it's malformed.

I understand that there is value in accepting more URIs for of_string but it might be worth exposing new entry points that can return errors, e.g. something like port_res : t -> (int option, _) result so it's possible to distinguish malformed components from missing components. That still won't allow me to use uri but I think that's fine, given those URIs aren't valid I didn't really expect uri to accept them in the first place. An error here would probably have helped me debug this faster though!

Anyway, thanks for your time and answers!

dinosaure commented 3 years ago

I make a pull request to fix the scheme but it seems that you want to do something that you already know that's wrong. I'm closed the issue.

For the port, it's more tricky when path-nonscheme. The missing information is more about the existence of : for the port or for the path-nonscheme. Finally the error is elsewhere, see #155 (which does not really fix your task).