Closed NathanReb closed 3 years ago
The user@host:path
syntax is not RFC3986: https://git-scm.com/docs/git-clone#_git_urls
They call it "alternative scp-like syntax". Git also understands ssh://
followed by a standard URI.
The uri
library handles this syntax:
# Uri.path (Uri.of_string "user@host:50/path");;
- : string = "50/path"
# Uri.path (Uri.of_string "ssh://user@host:50/path");;
- : string = "/path"
git@github.com
is the scheme for Uri.t
(however, it should not be when @
is not an authorized character according the RFC). However, such value can be considered as valid when the authority
can have an @
character to refer to an userinfo
. But, yes, the double-dot should introduce a port (which can only be a suit of digits). However, if we still follow the ABNF, a possible path-nonscheme
can exists - and, in this way, the given data is valid (and owner
can be valid but not as a port).
According to the RFC, git@github.com:owner/repo.git
should be valid but not on your way. More concretely, Uri.t
wants to follow the Postel law where you are liberal about inputs and conservative about outputs. Due to the large usage of Uri.t
, we should continue to accept anything - even if it's wrong.
The @Julow's way is valid if you want to use SSH with Uri.t
(and we used it for a long time on the ocaml-git
side - minor meta-information such as the private key).
You are right, uri
doesn't like the "scp-like" syntax:
# Uri.scheme (Uri.of_string "user@host:50/path");;
- : string option = Some "user@host"
I was aware that Uri doesn't handle the scp-like syntax and I was already appending a scheme when there was none.
The resulting Uri.t
, be it with or without the scheme is broken so I'm a bit surprised you're willing to maintain that behaviour but that's your call!
On my end I unfortunately can't use Uri. Feel free to close this issue!
One note I'd like to add is that this choice of behaviour can cause those malformed URIs to be interpreted like proper ones. The None
returned by port
can mean two things: there's no explicitly specified port or there's one but it's malformed.
I understand that there is value in accepting more URIs for of_string
but it might be worth exposing new entry points that can return errors, e.g. something like port_res : t -> (int option, _) result
so it's possible to distinguish malformed components from missing components.
That still won't allow me to use uri
but I think that's fine, given those URIs aren't valid I didn't really expect uri
to accept them in the first place. An error here would probably have helped me debug this faster though!
Anyway, thanks for your time and answers!
I make a pull request to fix the scheme but it seems that you want to do something that you already know that's wrong. I'm closed the issue.
For the port, it's more tricky when path-nonscheme
. The missing information is more about the existence of :
for the port
or for the path-nonscheme
. Finally the error is elsewhere, see #155 (which does not really fix your task).
I don't know much about RFC3986 but from a quick look, it appears to me that github is abusing the spec and uses the port for the repository owner. I.e. according to the specification, ssh URIs are roughly
user@host:port/path
and Github SSH URIs are as follows:git@github.com:owner/repo.git
. Problem is that the port should be composed of digits only and that's not the case of github users and organisations names.I didn't know about this before and was trying to use
uri
to parse URIs in dune-release rather than relying on our own parsers and that abuse of the port part is preventing me from doing so.Uri.of_string
behaviour is a bit surprising here though: it succeeds but I'm then unable to get that information back from theUri.t
it returned sinceport
is defined as:and therefore returns
None
in my case.Shouldn't
Uri.from_string
fail in such cases? If so then I'd understand that you would not want to provide a way to parse such URIs as it doesn't follow the RFC even though I'd love it to be able to use some unsafe API inuri
to do so!