gravitational / teleport

The easiest, and most secure way to access and protect all of your infrastructure.
https://goteleport.com
GNU Affero General Public License v3.0
16.94k stars 1.7k forks source link

Populate tls.Config.NextProtos for teleport specific ALPNs #43473

Open rosstimothy opened 3 days ago

rosstimothy commented 3 days ago

Any alpnproxy.HandlerDesc that forwards TLS or provides its own tls.Config is responsible for setting the tls.Config.NextProtos so that the negotiated ALPN is returned to clients. In all other cases the proxy web tls.Config is used which populates the next protos with all supported ALPN protocols. All of the existing places that the default tls.Config was being overridden were supplying the NextProtos. This updates them to populate the NextProtos with the correct values and updates the commentary on alpnproxy.HandlerDecs to indicate that NextProtos should be set if not using the default tls.Config.

Fixes #41500.

Changelog: Ensure the negotiated ALPN is returned correctly for connections to Teleport.

rosstimothy commented 3 days ago

Can we add a check for ConnectionState().NegotiatedProtocol in tests?

@espadolini added a test in f605f8c that validates the negotiated protocols.

nklaassen commented 2 days ago

What is the problem with not returning the custom protocol? Just trying to understand if the way we use teleport-proxy-grpc is problematic too, it relies on the client requesting the protocol for routing, but IIRC we actually have to negotiate to h2 for grpc to work

espadolini commented 2 days ago

What is the problem with not returning the custom protocol?

It breaks the ALPN contract; the entire point of ALPN is that the client presents a list of protocols and the server chooses one that's supported and makes the client aware, without adding roundtrips to the communication.

If we're choosing to do teleport-proxy-grpc we should select that and have both sides aware of it. If that wouldn't work with clients in the wild today then we have to rethink the approach a bit I guess. Maybe a separate teleport-proxy-grpc-but-this-time-we-actually-read-the-rfc nextproto that both servers and clients can ask and use?

nklaassen commented 2 days ago

What is the problem with not returning the custom protocol?

It breaks the ALPN contract; the entire point of ALPN is that the client presents a list of protocols and the server chooses one that's supported and makes the client aware, without adding roundtrips to the communication.

If we're choosing to do teleport-proxy-grpc we should select that and have both sides aware of it. If that wouldn't work with clients in the wild today then we have to rethink the approach a bit I guess. Maybe a separate teleport-proxy-grpc-but-this-time-we-actually-read-the-rfc nextproto that both servers and clients can ask and use?

I meant like, is there an actual problem/bug/situation where this breaks, that we need to fix or else things aren't going to work?

Not sure I'd go so far as to say it breaks the ALPN contract for the client to advertise support for [teleport-proxy-grpc, h2], the server selects h2, and the actual protocol used between the two is h2. teleport-proxy-grpc is not a real protocol that's supported by any server, so I guess our client is lying a bit, but it is a very useful way to pass that routing information

espadolini commented 1 day ago

I meant like, is there an actual problem/bug/situation where this breaks, that we need to fix or else things aren't going to work?

The unfortunate situation of our ALPN misuse results in the inability to understand if what's negotiated is actually the thing that we intended to reach or, for example, it's just a TLS terminator in front of the proxy, which means that anything involving a connection to the proxy needs either stored state (which might be out of date), prior knowledge of the cleanliness of the connection (which might be wrong), or a second roundtrip (which is slow). If we actually used the protocol as it's specced to do, we could've avoided the tls_routing_conn_upgrade_required thing, as any connection to the proxy public address with the appropriate ALPN could've just transparently used the connection upgrade fallback.

This also makes every new protocol incredibly flaky, as it's hard to actually know if the proxy understands the protocol we intended to use - unless one picks just the one protocol, and on failure a new connection needs to happen (which is half a tls handshake and a full tcp and tls handshake slower).