Closed cmurphy closed 3 years ago
@ibuildthecloud and @brandond could you review this? (I can't add reviewers)
I've tried to verify that this change doesn't cause a regression to the three related issues mentioned above and seems to be working fine, but I could be missing something.
See the comment at https://github.com/rancher/dynamiclistener/pull/28/files#r468192991 as to why I moved it.
If you can confirm that this doesn't regress the issue that moving it fixed then I'm good.
Is the http: TLS handshake error from 127.0.0.1:51232: write tcp 127.0.7.1:8443->127.0.0.1:51232: use of closed network connection
error perhaps caused by the fact that it's closing the connection that it's in the middle of completing a TLS handshake on? In that case, the correct fix might be to not add the connection to dynamicListener.conns
until after the handshake is complete, so that it doesn't get closed when reloading the cert? We haven't even completed the TLS handshake yet so we don't need to close it in order to get the right cert loaded.
You might also be using dynamiclistener in an environment where you don't need to or shouldn't close existing connections when the cert changes?
Is the http: TLS handshake error from 127.0.0.1:51232: write tcp 127.0.7.1:8443->127.0.0.1:51232: use of closed network connection error perhaps caused by the fact that it's closing the connection that it's in the middle of completing a TLS handshake on? In that case, the correct fix might be to not add the connection to dynamicListener.conns until after the handshake is complete, so that it doesn't get closed when reloading the cert?
That's what seems to be happening, so moving https://github.com/rancher/dynamiclistener/blob/94e22490cfa89c4c33be4783533d920e3f511102/listener.go#L254-L256 somewhere else would fix it if I can find the right place to put it.
You might also be using dynamiclistener in an environment where you don't need to or shouldn't close existing connections when the cert changes?
Based on https://github.com/rancher/rancher/pull/25386 closing existing connections is intentional and necessary for websockets to work but I'm not sure if it's needed otherwise.
There's nowhere to move the dynamicListener.wrap to that would help because the problem occurs during the dynamiclistener.Listener.Accept routine which is mostly a black box apart from GetCertificate, so I made the change there to look at the client hello data and skip closing the connection that was in the middle of its handshake.
This unfortunately doesn't help if there are multiple requests made at once and the cert is updated for any of them, since all the connections are wrapped and then all except one are closed :thinking:
Ah yeah, that doesn't help the other connections does it. Is there some way to add a flag to the wrapped connections to indicate whether or not we've finished the handshake (or at least returned a cert to be used for the handshake) yet? We only want to close connections that were completed before the cert was updated, right?
Alright now the wrapper tracks whether the connection has ever gotten a cert and uses that to know whether it's safe to close
Without this change, if a cert is updated (e.g. to add CNs) while the listener is in the middle of Accept()ing a new connection, the connection gets dropped, we'll see a message like this in the server logs:
http: TLS handshake error from 127.0.0.1:51232: write tcp 127.0.7.1:8443->127.0.0.1:51232: use of closed network connection
and the client (like a browser) won't necessarily reconnect. This change modifies the GetCertificate routine in the listener's tls.Config to keep track of the state of the incoming connections and only close connections that have completed GetCertificate and therefore are finished with their TLS handshake, so that only old established connections are closed.
Issue: https://github.com/rancher/rancher/issues/34038
Related context: