Open torinnd opened 1 year ago
Introducing a submodule sounds fine, and we can move it out to a toplevel Absolute_uri in the future. I'm not sure that Absolute
is the right prefix though; perhaps Uri_with_host
or Uri.With_host
is more descriptive but more unwieldy. It would be good to check through the rest of RFC9110 and make sure there's nothing else aside from the host that needs attention, while we're doing this.
The deprecation of userinfo also seems important for RFC-compliance (https://httpwg.org/specs/rfc9110.html#rfc.section.4.2.4). A rough version of the submodule approach could be:
# mli
(** Specializations for HTTP and HTTPS schemes as per RFC9110 *)
module Absolute_http : sig
type h
val of_t : t -> (h, [ `Msg of string ]) result
val to_t : h -> t
val of_string : string -> (h, [ `Msg of string ]) result
val to_string : ?pct_encoder:pct_encoder -> h -> string
val make : scheme:[ `Http | `Https ]-> host:string ->
?port:int -> ?path:string -> ?query:(string * string list) list ->
?fragment:string -> unit -> h
end
# ml
module Absolute_http = struct
type h =
{ scheme : [ `Http | `Https ]
; host : Pct.decoded
; port : int option
; path : Path.t
; query : Query.t
; fragment : Pct.decoded option
}
let ( let* ) = Result.bind
let to_t { scheme; host; port; path; query; fragment } =
let scheme =
match scheme with
| `Http -> Pct.cast_decoded "http"
| `Https -> Pct.cast_decoded "https"
in
{ scheme = Some scheme
; userinfo = None
; host = Some host
; port
; path
; query
; fragment
}
;;
let of_t { scheme; userinfo; host; port; path; query; fragment } =
let* scheme =
match scheme with
| None -> Error (`Msg "No scheme present in URI")
| Some scheme ->
(match Pct.uncast_decoded scheme with
| "http" -> Ok `Http
| "https" -> Ok `Https
| unsupported_scheme ->
Error
(`Msg
(Printf.sprintf
"Only http and https URIs are supported. %s is invalid."
unsupported_scheme)))
in
let* host = Option.to_result ~none:(`Msg "host is required for HTTP(S) uris") host in
let* () =
match userinfo with
| None -> Ok ()
| Some (_ : string * string option) ->
Error (`Msg "userinfo is a deprecated field in HTTP(S) URIs as per RFC9110")
in
Ok { scheme; host; port; path; query; fragment }
;;
let of_string s = of_string s |> of_t
let to_string ?pct_encoder h = to_t h |> to_string ?pct_encoder
let normalize h =
{h with
host=Pct.cast_decoded (String.lowercase_ascii (Pct.uncast_decoded h.host))
}
let make ~scheme ~host ?port ?path ?query ?fragment () =
let decode = function
|Some x -> Some (Pct.cast_decoded x) |None -> None in
let path = match path with
|None -> [] | Some p ->
let path = path_of_encoded p in
match path with
| "/"::_ | [] -> path
| _ -> "/"::path
in
let query = match query with
| None -> Query.KV []
| Some p -> Query.KV p
in
normalize
{ scheme;
host=Pct.cast_decoded host; port; path; query; fragment=decode fragment }
end
I see that Absolute_uri is a term from RFC9110, so we could just call this Uri.Absolute to indicate the mandatory presence of a non-empty path.
I think it might be worth separating the errors out for a deprecated user_info (which clients could validly choose to allow to pass through, after stripping it out) from the missing host error.
I see that Absolute_uri is a term from RFC9110, so we could just call this Uri.Absolute to indicate the mandatory presence of a non-empty path.
Because the specificity seems important, I want to call out that an Absolute URI implies the presence of a hier-part, not necessarily an authority section (which is where a host identifier can exist):
hier-part = "//" authority path-abempty
/ path-absolute
/ path-rootless
/ path-empty
I don't think this would satisfy the requirements in RFC9110, so it's probably not specific enough to describe this as a [Uri.Absolute.t]:
A sender MUST NOT generate an "http" URI with an empty host identifier. A recipient that processes such a URI reference MUST reject it as invalid.
I'll also offer mild pushback on the userinfo section, based on the language of the RFC:
A sender MUST NOT generate the userinfo subcomponent (and its "@" delimiter) when an "http" or "https" URI reference is generated within a message as a target URI or field value.
Before making use of an "http" or "https" URI reference received from an untrusted source, a recipient SHOULD parse for userinfo and treat its presence as an error
I suppose there's an argument to be made that the client-side SHOULD isn't strong enough to forbid this, but it seems to me that erroring when attempting to parse a Uri.t as a RFC9110-complaint HTTP(S) URI is aligned with the spirit of the RFC, if not the exact language.
An error doesn't mean that the connection should be aborted; it means that the URI shouldn't be propagated. From that RFC text, it seems valid for me for an HTTP proxy to spot a user-info field, remove it (perhaps authenticating out of band), and to make a downstream request without it. We should make sure our URI interfaces support that. It might be that it happens within Uri
and not in Absolute_uri
though.
I've cleaned up the proposed changes above in #164
I find myself reaching for a more strongly typed URI when the scheme is one of "http" or "https". As per https://httpwg.org/specs/rfc9110.html#rfc.section.4.2 the following semantics need to be enforced for absolute URIs with one of these two schemes:
One way this could be addressed is by leveraging a GADT like:
This would be a non-trivial API change for URI, of course. A smaller change could be to introduce a submodule into the MLI for an [Absolute_http.h] with appropriate conversion functions:
... and a subset of the [Uri] MLI adapted to the constraints enforced by the scheme (e.g. [make] would require [host], omit [userinfo], and accept a more constrained set of [scheme]'s).
I have a fairly narrow understanding of other use cases for URI (for example, I see there is an old Issue #158 for Websocket support, which feels related), but imagine there are additional considerations besides what I've raised here. The big win for me personally would be a calling pattern where I could be confident that a [host] is present.