rancher / dynamiclistener

Apache License 2.0
17 stars 61 forks source link

Skip closing an initializing connection #42

Closed cmurphy closed 3 years ago

cmurphy commented 3 years ago

Without this change, if a cert is updated (e.g. to add CNs) while the listener is in the middle of Accept()ing a new connection, the connection gets dropped, we'll see a message like this in the server logs:

http: TLS handshake error from 127.0.0.1:51232: write tcp 127.0.7.1:8443->127.0.0.1:51232: use of closed network connection

and the client (like a browser) won't necessarily reconnect. This change modifies the GetCertificate routine in the listener's tls.Config to keep track of the state of the incoming connections and only close connections that have completed GetCertificate and therefore are finished with their TLS handshake, so that only old established connections are closed.

Issue: https://github.com/rancher/rancher/issues/34038

Related context:

cmurphy commented 3 years ago

@ibuildthecloud and @brandond could you review this? (I can't add reviewers)

I've tried to verify that this change doesn't cause a regression to the three related issues mentioned above and seems to be working fine, but I could be missing something.

brandond commented 3 years ago

See the comment at https://github.com/rancher/dynamiclistener/pull/28/files#r468192991 as to why I moved it.

If you can confirm that this doesn't regress the issue that moving it fixed then I'm good.

brandond commented 3 years ago

Is the http: TLS handshake error from 127.0.0.1:51232: write tcp 127.0.7.1:8443->127.0.0.1:51232: use of closed network connection error perhaps caused by the fact that it's closing the connection that it's in the middle of completing a TLS handshake on? In that case, the correct fix might be to not add the connection to dynamicListener.conns until after the handshake is complete, so that it doesn't get closed when reloading the cert? We haven't even completed the TLS handshake yet so we don't need to close it in order to get the right cert loaded.

brandond commented 3 years ago

You might also be using dynamiclistener in an environment where you don't need to or shouldn't close existing connections when the cert changes?

cmurphy commented 3 years ago

Is the http: TLS handshake error from 127.0.0.1:51232: write tcp 127.0.7.1:8443->127.0.0.1:51232: use of closed network connection error perhaps caused by the fact that it's closing the connection that it's in the middle of completing a TLS handshake on? In that case, the correct fix might be to not add the connection to dynamicListener.conns until after the handshake is complete, so that it doesn't get closed when reloading the cert?

That's what seems to be happening, so moving https://github.com/rancher/dynamiclistener/blob/94e22490cfa89c4c33be4783533d920e3f511102/listener.go#L254-L256 somewhere else would fix it if I can find the right place to put it.

You might also be using dynamiclistener in an environment where you don't need to or shouldn't close existing connections when the cert changes?

Based on https://github.com/rancher/rancher/pull/25386 closing existing connections is intentional and necessary for websockets to work but I'm not sure if it's needed otherwise.

cmurphy commented 3 years ago

There's nowhere to move the dynamicListener.wrap to that would help because the problem occurs during the dynamiclistener.Listener.Accept routine which is mostly a black box apart from GetCertificate, so I made the change there to look at the client hello data and skip closing the connection that was in the middle of its handshake.

cmurphy commented 3 years ago

This unfortunately doesn't help if there are multiple requests made at once and the cert is updated for any of them, since all the connections are wrapped and then all except one are closed :thinking:

brandond commented 3 years ago

Ah yeah, that doesn't help the other connections does it. Is there some way to add a flag to the wrapped connections to indicate whether or not we've finished the handshake (or at least returned a cert to be used for the handshake) yet? We only want to close connections that were completed before the cert was updated, right?

cmurphy commented 3 years ago

Alright now the wrapper tracks whether the connection has ever gotten a cert and uses that to know whether it's safe to close