socketry / async-http

MIT License
298 stars 45 forks source link

Add auto-detection of HTTP1 vs HTTP2 for inbound http:// connections #128

Closed zarqman closed 3 weeks ago

zarqman commented 10 months ago

This PR adds Async::HTTP::Protocol::HTTP which auto-detects HTTP1 vs HTTP2 for inbound (server) connections. It is comparable to Async::HTTP::Protocol::HTTPS, just for http:// instead.

It uses HTTP/2 Connection Preface detection, outlined in the HTTP/2 RFC.

This is only for server connections. Protocol::HTTP's client always uses HTTP1 as there is no present facility in the RFC for auto-detection for clients using http://. (There was some language around Upgrade: h2c in earlier RFC versions, but it has since been removed.)

Note on HTTP/1.0

~~HTTP/1.0 requests have the potential to be too short to be properly detected. This is expected behavior, and is treated no differently than any other incomplete inbound HTTP/1.x request.~~ ~~The risk is limited to HTTP/1.0 requests without any headers, including no Host header, such as:~~ ``` GET / HTTP/1.0 ``` ~~It's 6 bytes short, and it's only because the Host header is optional in HTTP/1.0. If Host, any other header (ie: User-Agent), a longer path, or a query parameter is present, then it's good to go.~~ ~~The only place this might show up is using an ultra-basic monitoring script for health checks. Even most of those will add Host and/or User-Agent, but if not, adding a dummy header or query param will solve it (or using https://).~~ ~~Any Server needing assured HTTP/1.0 request support should continue to use Protocol::HTTP1 and not switch to this new Protocol::HTTP.~~

Outstanding considerations

1) Auto-detection uses peek() and requires converting the IO Socket into an Async::IO::Stream one layer earlier. This PR presently modifies the other Protocol::HTTPxx modules to handle this. It's a rather limited change.

However, I could see changing `IO::Stream.new(...)` in each of these to a new `IO::Stream.wrap(...)` method and then moving the `is_a?` detection into there.

Let me know if you'd prefer that and I'll change it here and submit a parallel PR on async-io for that.

2) I'm unsure whether it's appropriate to change the default value in Async::HTTP::Endpoint#protocol from HTTP1 to HTTP. I have not included that change at this time, but let me know if you'd like it added.

As always, questions and feedback welcomed.

Types of Changes

Contribution

ioquatix commented 9 months ago

Apologies, but I wanted to update the test suite to sus before considering any further PRs. Now that's done, do you mind rebasing?

Also, I have a question, what is the use case? Since we can't use HTTP/2 plain text in general, ALPN is a requirement which is sufficient.

Regarding the special token, isn't it sufficient just to detect PRI * HTTP/2.0? Surely this could never be a valid HTTP/1.x request after that point.

zarqman commented 9 months ago

Rebased.

h2 over http:// is usable, it just requires prior knowledge by the client. While a browser isn't going to use h2 over http://, internal (or otherwise controlled) infrastructure might. For example, a public-facing load balancer performing TLS offload might use http:// to contact the next layer running async-http (could be falcon or otherwise). Using h2 gains all the h2 benefits like reduced connection counts and better stream isolation. At the same time, it's not unusual for healthcheck monitoring to support only HTTP/1.1 (or even HTTP/1.0). This allows both h1 and h2 to work in parallel.

Another use case is testing (manual or automated) where configuring https is unwanted. Instead of spinning up h1 and h2 separately, both are viable on the same port. I've personally found it more efficient to just add/remove a single flag to curl (--http2-prior-knowledge) to switch back and forth from h1 to h2 rather than have to configure and toggle between separate ports or use TLS.

As for the preface token, great question! The preface I've used comes exactly from the RFC which is quite specific that this is the string to use. I haven't researched the background for why they made it longer than the more simplified version you mention, but I'll assume it was done with reason.

ioquatix commented 9 months ago

I went to check this for myself.

https://datatracker.ietf.org/doc/html/rfc9113#section-3.4

I found at least one implementation that, for HTTP/1 only checks 'PRI ' as the indicator for the upgrade. Maybe if you have time, can you check some other implementations?

The actual connection preface must be specified during HTTP/2 connection negotiation too: https://github.com/socketry/protocol-http2/blob/d2f68154755dc678360763c9f3c178ad5f304b56/lib/protocol/http2/framer.rb#L36

I think if it's convenient, it's okay to peek PRI * HTTP/2.0\r\n and that's a strong indicator of the upgrade. We can later validate the full string. There is no world in which the above string is a valid HTTP/1 request, in practice.

zarqman commented 9 months ago

Nginx and Apache each appear to look for the full string.

Traefik and Caddy both use go's net/http, which seems to parse the initial request line as if it were HTTP/1. Then, as part of early h1 request handling, it looks for 'PRI', '*', and 'HTTP/2.0' in their respective fields. If found, only then does it hand off to the h2 handler.

Thinking about it more, the recommended preface string was possibly selected because it's not actually a compliant HTTP/1.x request in that it has the appearance of having a body, but would be missing Content-Length.

I have a thought on a potential short circuit for requests with too few bytes, assuming that's the issue that's giving you pause. Let me explore that.

zarqman commented 9 months ago

Okay, that new idea worked out quite well. Now, if the peeked bytes cannot match, it stops looking immediately without waiting for enough bytes to satisfy the entire preface. This solves the HTTP/1.0 short request issue.

zarqman commented 8 months ago

Updated protocol default and **options as requested. Hopefully good to go!

ioquatix commented 2 months ago

I've rebased this PR, but I think the external tests are failing.

In addition, peek(n) now exists, so we can be more precise about the actual amount of data we want to check. Should we take advantage of it?

ioquatix commented 3 weeks ago

@zarqman thanks for your contribution.