:scheme - Githubissues

martinthomson commented 3 years ago

RFC 7540 requires that CONNECT requests omit the :scheme pseudo-header:

The ":scheme" and ":path" pseudo-header fields MUST be omitted.

Why does this draft need to define a URI scheme and then include that in :scheme?

There are already two different definitions for udp: URI scheme. Both seem - at least on face value - fairly sensible, but they both come with considerable baggage.

The question I'd like to ask is whether this protocol even needs a URI scheme at all. My first inclination is very much that it does not.

DavidSchinazi commented 3 years ago

I added the scheme because I thought it was required (and folks in the room at IETF 109 seemed to think it was required too), but maybe it isn't.

Unfortunately, CONNECT-UDP is distinct from CONNECT, so it cannot benefit from the exception that CONNECT has. This is pretty explicit in draft-ietf-httpbis-semantics s6.1:

the request target is a URI reference For CONNECT, the request target is the host name and port number These forms MUST NOT be used with other methods.

This means that, unlike CONNECT, CONNECT-UDP needs a URI reference.

However, according to RFC 3986:

URI-reference = URI / relative-ref URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ] relative-ref = relative-part [ "?" query ] [ "#" fragment ] relative-part = "//" authority path-abempty / path-absolute / path-noscheme / path-empty path-abempty = *( "/" segment ) authority = [ userinfo "@" ] host [ ":" port ]

So, if we go the route of URI-reference = relative-ref = relative-part = "//" authority path-abempty I think we can legally say that CONNECT-UDP request targets are of the form "//" host ":" port and require the :scheme and :path pseudo-headers to not be sent. This would allow us to use URIs without schemes for CONNECT-UDP.

Can someone who's more of an HTTP guru comment on this please? @martinthomson @mnot @MikeBishop

MikeBishop commented 3 years ago

It's an intriguing approach, but I'm not sure it works.

SEMANTICS 4.1 says:

Each protocol element in HTTP that allows a URI reference will indicate in its ABNF production whether the element allows any form of reference (URI-reference), only a URI in absolute form (absolute-URI), only the path and optional query components, or some combination of the above. Unless otherwise indicated, URI references are parsed relative to the target URI (Section 7.1).

I don't see anywhere in SEMANTICS it specifies that a method's target URI can't be relative. However, for HTTP/1.1 at least:

When making a request to a proxy, other than a CONNECT or server-wide OPTIONS request (as detailed below), a client must send the target URI in absolute-form as the request-target.

absolute-form does not permit the use of relative URIs.

Since support for this method is negotiated in SETTINGS for HTTP/2 and HTTP/3, it's possible for it to depart from the general requirements for which pseudo-headers to send and specify something that looks more like CONNECT (omit :scheme and :path, for example). However, I'm not convinced that can be generalized.

@RoyFielding and @reschke, thoughts?

DavidSchinazi commented 3 years ago

@MikeBishop support for CONNECT-UDP is not gated by a SETTINGS parameter. Only the datagram mode in h3 has a setting.

MikeBishop commented 3 years ago

Ah, then it would need to include :scheme and :path pseudo-headers; I'm not sure off-hand if it's valid for those to be empty.

martinthomson commented 3 years ago

I think that I would prefer a setting. It's probably true that this doesn't require special handling, but there are intermediaries that do crazy things like buffer request bodies. A setting would avoid any potential for misunderstanding. It makes this unusable in HTTP/1.1, but I might see that as more of a feature than a bug.

DavidSchinazi commented 3 years ago

I would strongly prefer to avoid a setting, as it breaks the MASQUE Obfuscation use-case.

mnot commented 3 years ago

I tend to think that defining a URI scheme (perhaps not udp:) is the path of least resistance here, and I don't see any immediate harm in doing so. As a thought experiment, consider how the Web would work if CONNECT had a URI scheme; that would give a distinct way to convey proxy configuration, for example.

Based on the above I also think we need to clarify in Semantics that a target-uri is always absolute, even if it is transmitted as a relative uri.

MikeBishop commented 3 years ago

For your obfuscation goals, the client needs to initiate any feature detection and do so after or in parallel with authentication. So you're right, probably that precludes a setting unless you're using client certs to authenticate (which would itself be visible on a probe, though not clear evidence of Masque). Then I agree with @mnot, that your path of least resistance is to mint a scheme for a UDP connection.

The fact that udp:// has been used to describe other services available over UDP is regrettable, since those services actually intended to convey certain protocol endpoints. In this instance, you are literally describing a UDP connection with no protocol-specific properties; I'd consider asking @dthaler if he'd be willing to have the provisional registration (which he did as an experiment) replaced by a permanent one that simply designates udp:// as describing a UDP endpoint without conveying application-layer protocol expectations.

DavidSchinazi commented 3 years ago

Thinking about this some more, could we use the "https" scheme? I really like @mnot's idea to use a URI for proxy configuration, as that could allow future extensibility. Chrome already uses URIs for proxy config and the "https" scheme there already maps TCP to the CONNECT method. I think it would make sense to have that also map UDP to CONNECT-UDP. And with that mindset, having the scheme sent over the wire as "https" seems natural, and bypasses the need to define a new URI scheme. Thoughts?

LPardue commented 3 years ago

Can you explain a bit more please David? I'm vaguely familiar with Chrome's URI format of proxy configuration.

I note that the link you provide also defines THR "quic" scheme. And I presume that it would be nice to ditch that some something more consistent. IIRC it is possible for a proxy identified by "http" or "https" schemes to offer H3 as an Alt-Svc. And then it's upon the client to decide how to tunnel requests through such a proxy that might be capable of talking CONNECT and CONNECT-UDP.

If this is too far off tangent (because of discovery) , we don't need to explore it here.

DavidSchinazi commented 3 years ago

@LPardue I totally agree with you, Chrome's "quic" scheme is historical and I think it should now be deprecated in favor of "https" (and rely on Alt-Svc to use h3). But yes this is somewhat of a tangent that we may want to discuss offline. I was simply trying to make the case that there was some sort of precedent to consider proxying using CONNECT as "https".

Back to the scheme discussion, since the scheme carries no useful information on CONNECT-UDP requests, I'm proposing to simply use "https".

martinthomson commented 3 years ago

That all works for me. It doesn't adhere to any sort of architectural purity, but CONNECT has no hope of achieving that; hoping to include some notion of purity along other design constraints would only make things worse.

I like the idea that you can just use "https" and work out the rest using the protocol negotiation tools we have (Alt-Svc, HTTPS, settings).

gloinul commented 3 years ago

I think some discussion in this thread, but likely not fully what the intended semantics of the URI scheme should be. I think there are two basic ones that I can think of and the first one is:

A) Use a MASQUE proxy to send packets to "target address"

However, I would think that outside of the MASQUE protocol a more useful URI semantics would be:

B) Use this MASQUE proxy to send traffic to this target address.

I also think we have to take a step back and ask ourselves if MASQUE URI scheme implies UDP or it it should use parameters to encode which protocol that should be proxied over the MASQUE proxy?

The other aspect in this country I think the A) semantics in the -03 connect-udp draft is that is it doesn't discuss the identity of the proxy. For the security aspect the MASQUE client need to know the identity of the proxy so that it can verify the initial H/3 connection to the MASQUE proxy.

tfpauly commented 3 years ago

Can we simply recommend using https as the scheme, but allow clients to specify a different scheme if appropriate?

mnot commented 3 years ago

After discussion at the meeting, my understanding is that the target resource CONNECT-UDP might be an origin that speaks CONNECT-UDP, or it might be a 'raw' UDP endpoint.

In the former case, a HTTPS schemed target URI makes sense; in the latter, something new is necessary (e.g., udp://, if we can make that interoperable).

martinthomson commented 3 years ago

given that the authority component of an https:// URI would not need to be aware of CONNECT-UDP, I am not sure that this reasoning for https:// works for me. IOW, it implies that a new scheme is good.

I think that udp:// is doubly burned already, unfortunately.

mnot commented 3 years ago

@martinthomson if we required a new scheme for every new method introduced, we'd have some problems. Why is that necessary?

martinthomson commented 3 years ago

I wasn't arguing for anything in particular, just looking for you (or someone) to address the problem somehow.

tfpauly commented 3 years ago

Given that we don't care about the scheme for now, and want to make sure people ignore it on receipt, how about we do https:// by default, but say that clients SHOULD grease the scheme? 😁

kazuho commented 3 years ago

FWIW, I tend to think that we should omit scheme, path, and use SETTINGS for HTTP/2 and HTTP/3.

As @martinthomson points out, we have to make sure that servers unaware of CONNECT-UDP do not mishandle CONNECT-UDP requests.

For HTTP/2 and /3, we can go with either using sending SETTINGS or send :scheme set to something else than https or http as well as sending a fake :path.

But in HTTP/1, there's no way of sending a scheme, therefore, we'd have to use a connect-style request line (i.e. `CONNECT-UDP host:port HTTP/1.1). Otherwise, servers unaware of the method would handle it as an ordinary request.

The latter point means that there's practical advantage in following the style of CONNECT method. Then, why not use a SETTINGS on H2 and H3 to minimize the difference between H1 and H2/H3?

MikeBishop commented 3 years ago

It's actually legal to send a fully-qualified URI in HTTP/1.1; that's just not the common case when you're talking to an origin server. I don't think it's a problem to be considering a full URI for CONNECT-UDP in HTTP/1.1. Of course, if we know there's a lack of support for that form, that's a different issue.

kazuho commented 3 years ago

@MikeBishop

It's actually legal to send a fully-qualified URI in HTTP/1.1; that's just not the common case when you're talking to an origin server. I don't think it's a problem to be considering a full URI for CONNECT-UDP in HTTP/1.1. Of course, if we know there's a lack of support for that form, that's a different issue.

That's a good point. Regarding the concern that you point out, I'm not sure if that would actually be an issue because a request in absolute-URI form will be considered malformed and rejected by a server that only supports HTTP/1.0-style requests.

However, there's the other failure mode, and that is servers ignoring the scheme. IIUC, most origin servers do not have the incentive to consult and reject requests based on the scheme.

When I send GET masque://server/ HTTP/1.1 to some host, I do see 200 responses, from servers hosted by large IaaS operators / multiple CDNs^{note 1}. When sending GET server:port HTTP/1.1 to those servers, I do see 4xx errors.

Therefore, if we are to care about how existing HTTP/1.1 servers would respond to masque, I think that using the form of CONNECT-UDP host:port is going to provide us better results.

note 1: I'd take part of the blame, because h2o simply drops the scheme when receiving an HTTP/1.1 request. I'm glad that we are not alone!

DavidSchinazi commented 3 years ago

The more I'm thinking about this, the more I'm leaning towards the simplest solution: no SETTING, and simply use a scheme of https. That works perfectly for everything we need, and carries the least complexity.

kazuho commented 3 years ago

@DavidSchinazi I'm not sure if we'd be fine with the failure mode associated to that approach?

HTTP servers (or applications) often buffer the entire request, before they start processing it. Even the method is being checked after all the request body is being received.

When a client connects to such a server, sending a CONNECT-UDP request with an https URI, the server will hang forever, as the client never closes the request stream. Eventually, a timeout would kick in, but I do not think it's preferable to keep the client waiting until a timeout?

Am I missing something, or is it the case that you are fine with relying on a timeout?

DavidSchinazi commented 3 years ago

@kazuho thanks for clarifying - that's a good argument for a SETTING (at least in the server-to-client direction). However, I really dislike the idea that every new HTTP method needs its own SETTING... But apart from switching to extended CONNECT I'm not seeing a lot of alternate solutions. 🤔

royfielding commented 3 years ago

None of this makes any sense. If you can't send a valid request, then this protocol does not belong anywhere near an HTTP stream. Figure out how to send a valid request or build your own protocol.

DavidSchinazi commented 3 years ago

@royfielding how do you define a valid request? The issue here is that a CONNECT (or CONNECT-UDP) request is valid as far as our understanding of the HTTP specs, but that doesn't mean that servers handle them correctly - the important difference is that these requests are not followed by a FIN on the stream, and some servers don't handle that well, even if it's valid.

royfielding commented 3 years ago

I have never met a server that didn't handle an HTTP request "well", for its own definition of well. A server that doesn't know the method will use the standard parsing algorithm and respond at the end of the header section with a 405 or 501, depending on taste. If you want that to be faster, send fewer header fields. This is not a concern in practice.

There is no option to send host:port with CONNECT-UDP. That is not HTTP. Stop bringing it up.

Obviously, sending a GET request to a server is not the same as sending CONNECT-UDP to a server. The two have entirely different backwards-compatibility algorithms. Likewise, sending an absolute URI within a CONNECT-UDP request works just fine for every server chain that might be expected to support such a request, and the response is going to be 501, 405, or 400 for those that don't.

If you find a deployed server that evidences any other form of behavior, send the developers a bug report. The IETF is not the place to work around every possible fool's implementation of bad parsing. By the time you finish writing the spec, there will be at least three more ways to fail to implement that spec correctly. Focus instead on making it work for servers that implement HTTP correctly.

royfielding commented 3 years ago

No HTTP requests are followed by a FIN on the stream. FIN is part of TCP.

Clients do not half-close their sockets in HTTP/1.1 -- the end of a request is indicated by either the end of the header section or by the end of the request Content-Length's amount of data, and the server doesn't read past that until it looks for the next request (on a persistent connection). A server sends FIN after closing their end of the TCP connection.

For HTTP/2 and HTTP/3, both requests and response occur within protocol frames. A FIN only occurs when the underlying connection breaks.

DavidSchinazi commented 3 years ago

@royfielding my comment about the FIN was specific to HTTP/2+. With those versions, the client sends a FIN on the request stream when it is done sending the body (or the headers when there is no body) for non-CONNECT requests. Some servers have unfortunately come to expect that of all request types, leading to timeouts for CONNECT.

kazuho commented 3 years ago

@royfielding I get your argument that sending CONNECT-style host:port is an act of breaking HTTP. Though, I would argue that if we are to stay within HTTP, we have to use a hop-by-hop opt-in mechanism for CONNECT-UDP, because it breaks existing assumption within HTTP that servers can wait for a complete request before processing it.

To that end, I tend to believe that we have to do the following:

For HTTP/2 and /3, use SETTINGS, like we did in RFC 8441.
If we need support for HTTP/1.1, use upgrade, like we did in RFC 6455, RFC 7540.

@DavidSchinazi

However, I really dislike the idea that every new HTTP method needs its own SETTING... But apart from switching to extended CONNECT I'm not seeing a lot of alternate solutions.

We do not have to add a new SETTING parameter for every new type of tunnel. Assuming that we are fine with using SETTINGS, I tend to wonder if we could reuse RFC 8441. With that opt-in UDP tunnels can be created by using a CONNECT request (rather than a new method), with a scheme used for identifying that the payload is UDP datagrams?

royfielding commented 3 years ago

it breaks existing assumption within HTTP that servers can wait for a complete request before processing it.

There is no such assumption in HTTP. The request is complete when the header section is complete.

royfielding commented 3 years ago

IOW, if a server doesn't start processing a request message as soon as the header section is complete, it cannot correctly implement Expect and it probably doesn't support request content delimited by Transfer-Encoding or Content-Length. HTTP is designed for infinite length content streams, so there are no cases where we assume a recipient will wait until the end of the message body before starting to process the request. There would be no distinction between that and a denial-of-service vulnerability.

kazuho commented 3 years ago

@royfielding Sorry my phrasing was incorrect. What I meant to say was that HTTP does not assume servers to start sending a final response before receiving end of the stream ~~entire request body, unless the method is CONNECT~~.

EDIT. After initially posting this comment, I realized that for HTTP/1.1, lack of CL or TE indicates that the end of request headers is the end of the request. But for HTTP/2 and HTTP/3, the method being CONNECT is the only indicator that changes where the request ends, isn't it?

reschke commented 3 years ago

FWIW, if all you need is something that is a valid URI (and the content does not matter), a URN might work perfectly well:

urn:ietf:rfc:xxxx

(see https://tools.ietf.org/html/rfc2648).

(I think Ted Hardie mentioned this in the IETF meeting chat)

martinthomson commented 3 years ago

@enygren mentioned elsewhere that a locator that identifies the server providing the connect tunnel would be much better than the current design, which effectively puts the target host as an authority. I'm not yet sure what I think of that, but it's an interesting thought worth considering.

DavidSchinazi commented 3 years ago

Here is the example from @enygren's email to the list:

:method = CONNECT-UDP
:authority = masque-proxy.example.net
:path = /www.example.com?port=3568
:scheme = https

martinthomson commented 3 years ago

That specific example makes my RFC 8820 sense itch, but it's an OK illustration of the idea.

DavidSchinazi commented 3 years ago

@martinthomson do you have an alternate encoding that would appease your itch?

martinthomson commented 3 years ago

With the https scheme, we might be able to use a URI template (which might lead to that exact example if people so chose). Or we could define a new scheme...

enygren commented 3 years ago

@tfpauly pointed out that draft-pauly-dprive-oblivious-doh does something similar. It may make sense to have some consistency. Maybe:

:method = CONNECT-UDP
:authority = masque-proxy.example.net
:path = /.well-known/connect-udp?targethost=www.example.com&targetport=3568
:scheme = https

(perhaps adjusting the path somehow, or using a URI template?)

DavidSchinazi commented 3 years ago

I like the idea of keeping the https scheme and using the path to convey the target

martinthomson commented 3 years ago

No need to abuse .well-known when we can use a URI template.

reschke commented 3 years ago

...as in something using curly brackets...?

martinthomson commented 3 years ago

https://datatracker.ietf.org/doc/html/rfc6570

DavidSchinazi commented 3 years ago

So something like :path = /{target_host}/{target_port}/ ?

martinthomson commented 3 years ago

That would be an easy way of doing it. Or /mask?h={host}&p={port}.

reschke commented 3 years ago

But that would be an invalid path.

martinthomson commented 3 years ago

Would it be invalid after RFC 6570 substitution?

reschke commented 3 years ago

After substitution is of course ok.

So where do you want to put the template?

ietf-wg-masque / draft-ietf-masque-connect-udp

:scheme #23