Closed Keksoj closed 9 months ago
This looks like something that should be fixed in the control plane, not in sozu
Maybe the comment wasn't clear enough. If Sozu has a certificate A for domain name N, and we add a new certificate B also valid for N and A is deemed "weaker" than B (all its domains are covered by B and expire before B) then A is removed in favor of B. Which I think is the intended behavior of Sozu regardless of our control plane.
However, in this situation, the bug was that the fingerprint of A remained as the value for N in the TrieNode
(because insert
does nothing if the value already exists), so any TLS handshake for N would fail because domains
and certificates
were desynchronized.
I don't think it should be possible to put Sozu in an inconsistent state.
Or do you suggest that add certificate should never try to replace a certificate?
In production, Sōzu would fail when handling the AddCertificate command, in case a certificate with a concurrent domain name was present. The candidate certificate did not overwrite the outdated certificate in the TrieNode structure that maps domain names to certificate fingerprints.
This PR operates an vast rewrite of the CertificateResolver, its types and methods, and fixes the issue. On top of that, a rare and highly unlikely bug have been prevented, in case one domain name has two valid certificates bound to it.
todo: