socketry / async-http

MIT License
319 stars 46 forks source link

[HTTP1.1] `Host` header sent twice if also specified in request headers #177

Open cocoahero opened 2 months ago

cocoahero commented 2 months ago

While attempting to use async-http (0.71.0) with against a reverse proxy where we need to specify a Host header, the library sends the header twice: one with the URL's hostname value, and one with the value specified in the request. For most web servers, this results in a 400 Bad Request.

url = URI.parse("https://my.reverse.proxy.com/path/to/resource")

headers = {
  "Host" => "api.example.com"
}

body = JSON.dump({ foo: "bar" })

Sync do
  Async::HTTP::Internet.post(url, headers, body) do |resp|
    puts resp.inspect
  end
end

It looks like this is coming from protocol-http1 where the host header is always written, irrespective if it also exists in headers. https://github.com/socketry/protocol-http1/blob/main/lib/protocol/http1/connection.rb#L133-L138

The reverse proxy use case is one, but it could also be for any situation where the TCP/TLS socket connection should be made to a different authority than what the HTTP Host header dictates. For example, this should also be completely valid:

url = URI.parse("http://localhost:9292/path/to/resource")

headers = {
  "Host" => "example.multi.tenant.app.com" # used to lookup SaaS tenant, etc
}

body = JSON.dump({ foo: "bar" })

Sync do
  Async::HTTP::Internet.post(url, headers, body) do |resp|
    puts resp.inspect
  end
end
ioquatix commented 2 months ago

What you are trying to do is possible, but it's probably not achievable with Async::HTTP::Internet which is a canonical/opinionated interface for accessing internet resources following standard conventions.

More specifically, the host header is not part of HTTP/2+, and therefore the library has a specific field for dealing with host identification, referred to as authority. In HTTP/1, authority maps directly to the host header, and in HTTP/2+ it maps to the :authority pseudo header.

Note that this can be different from the address you connect to (in your case the reverse proxy), and even different from the hostname used for TLS negotiation (can be used to select certificates for validating the connection is secure).

According to your initial problem, something like this will work for you:

server_endpoint = ::IO::Endpoint.tcp("my.reverse.proxy.com", 443)
host_endpoint = ::Async::HTTP::Endpoint.parse("https://api.example.com/path/to/resource", server_endpoint)
client = ::Async::HTTP::Client.new(host_endpoint)

client.post(...)

This will configure the authority and TLS as if it was talking to api.example.com.

cocoahero commented 2 months ago

Correct me if I am wrong, but creating a client in this way will prevent connection reuse/persistent connections since we will need a Client instance per combination of TCP endpoint and host/authority. In reality, we want to reuse the connection to the TCP endpoint, but just change the Host header per request.

ioquatix commented 2 months ago

It will depend on your TLS setup, if you are expecting SNI (which is at the connection level, usually for virtual hosts) to work, you will need the setup I suggested, and yes, you will want to cache the client instances and yes, you are correct, it will be one connection at least per SNI hostname. If you don't need SNI, and only need to change the authority, you can do this instead:

def get(client, url, headers = Protocol::HTTP::Headers.new)
  client.call(
    Protocol::HTTP::Request.new(nil, url.hostname, "GET", url.path, headers, body)
  )
end

The 2nd argument to the Request.new is the authority (host) to use.

If this was more commonly requested, we could consider adding a keyword argument to the method helpers, e.g. internet.get(..., authority: ...) - however I don't know if your issue encompasses SNI or not - can you give me more details about that part of your setup?

epk commented 2 months ago

We are essentially trying to do the equivalent of the following Go code with async-http, where we want to keep a pool of persistent connections to the reverse proxy.

The reverse proxy only has a TLS certificate for my.tld.com but has millions of virtual hosts.

We use Cloudflare SaaS where they connect to the reverse proxy/loadbalancer with TLS/SNI set to my.tld.com and the Host/:authority header set to the tenant's domain.

https://developers.cloudflare.com/cloudflare-for-platforms/cloudflare-for-saas/reference/connection-details/

Show Code ```go package main import ( "crypto/tls" "fmt" "io" "net" "net/http" "net/http/httptrace" "time" ) func main() { // Create a custom dialer dialer := &net.Dialer{ Timeout: 30 * time.Second, KeepAlive: 30 * time.Second, } // Create a custom TLS config tlsConfig := &tls.Config{ ServerName: "nuc.adi.run", } // Create a custom transport transport := &http.Transport{ DialContext: dialer.DialContext, TLSClientConfig: tlsConfig, ForceAttemptHTTP2: true, MaxIdleConns: 100, IdleConnTimeout: 90 * time.Second, TLSHandshakeTimeout: 10 * time.Second, } // for trace events clientTrace := &httptrace.ClientTrace{ GetConn: func(hostPort string) { fmt.Println("starting to create conn ", hostPort) }, GotConn: func(info httptrace.GotConnInfo) { fmt.Printf("connection established %+v\n", info) }, TLSHandshakeDone: func(info tls.ConnectionState, err error) { fmt.Printf("TLS handshake done %+v\n", info) }, WroteHeaderField: func(key string, value []string) { if key == ":authority" { fmt.Println("Wrote header field ", key, value) } }, ConnectStart: func(network, addr string) { fmt.Println("starting tcp connection", network, addr) }, ConnectDone: func(network, addr string, err error) { fmt.Println("tcp connection created", network, addr, err) }, } // Create a client with the custom transport client := &http.Client{ Transport: transport, } // Create a new request req, err := http.NewRequest("GET", "https://nuc.adi.run", nil) if err != nil { fmt.Println("Error creating request:", err) return } doRequest(client, clientTrace, req, "code.aditya.me") fmt.Println() req, err = http.NewRequest("GET", "https://nuc.adi.run", nil) if err != nil { fmt.Println("Error creating request:", err) return } doRequest(client, clientTrace, req, "jellyfin.aditya.me") } func doRequest(client *http.Client, clientTrace *httptrace.ClientTrace, req *http.Request, overrideHostName string) { clientTraceCtx := httptrace.WithClientTrace(req.Context(), clientTrace) req = req.WithContext(clientTraceCtx) // override Host header req.Host = overrideHostName resp, err := client.Do(req) if err != nil { fmt.Println("Error performing request:", err) return } defer resp.Body.Close() io.Copy(io.Discard, resp.Body) fmt.Println("Response status:", resp.Status) } ```
go run main.go 
starting to create conn  nuc.adi.run:443
starting tcp connection tcp 10.0.0.2:443
tcp connection created tcp 10.0.0.2:443 <nil>
TLS handshake done {Version:772 HandshakeComplete:true DidResume:false CipherSuite:4865 NegotiatedProtocol:h2 NegotiatedProtocolIsMutual:true ServerName:nuc.adi.run PeerCertificates:[0x140001d8008 0x140001d8588] VerifiedChains:[[0x140001d8b08 0x140001d9088 0x140001d9608]] SignedCertificateTimestamps:[] OCSPResponse:[] TLSUnique:[] ECHAccepted:false ekm:0x104d76c80 testingOnlyDidHRR:false testingOnlyCurveID:29}
connection established {Conn:0x14000192008 Reused:false WasIdle:false IdleTime:0s}
Wrote header field  :authority [code.aditya.me]
starting to create conn  nuc.adi.run:443
connection established {Conn:0x14000192008 Reused:true WasIdle:true IdleTime:53.75µs}
Wrote header field  :authority [code.aditya.me]
Response status: 200 OK

starting to create conn  nuc.adi.run:443
connection established {Conn:0x14000192008 Reused:true WasIdle:true IdleTime:82.5µs}
Wrote header field  :authority [jellyfin.aditya.me]
starting to create conn  nuc.adi.run:443
connection established {Conn:0x14000192008 Reused:true WasIdle:true IdleTime:30.375µs}
Wrote header field  :authority [jellyfin.aditya.me]
Response status: 200 OK
ioquatix commented 2 months ago

Okay, I understand, SNI is not an issue for you.

Similar request has come up in the past, so I've introduced support for keyword arguments to Async::HTTP::Internet, you can specify authority: in your case to set the host header.

See https://socketry.github.io/async-http/releases/index.html#async::http::internet-accepts-keyword-arguments for more details.

cocoahero commented 1 month ago

Thanks @ioquatix, this unblocks us for now. As an aside, do you think it would also be beneficial to add a sanity check or other defensive measure to ensure the host header is not sent twice? It was not immediately intuitive when we received 400s on the reason why, and we were only able to figure it out because we had access to nginx logs.

cocoahero commented 1 month ago

Unfortunately, it appears that this change does not play well with WebMock. They use the request.authority when building the stub signature, which results in looking for the wrong host name. https://github.com/bblimke/webmock/blob/master/lib/webmock/http_lib_adapters/async_http_client_adapter.rb#L97-L106

ioquatix commented 1 month ago

Thanks, I'll take a look.