strimzi / strimzi-kafka-operator

Apache Kafka® running on Kubernetes
https://strimzi.io/
Apache License 2.0
4.78k stars 1.28k forks source link

CA Certificate renewal docs missing step to add old CA cert to secret #6811

Closed tmitchell10 closed 2 years ago

tmitchell10 commented 2 years ago

Suggestion / Problem I have been testing the CA Certificate renewal process using custom CA certificate, and found that the documentation appears to be missing a step. The Renewing your own CA certificates, https://strimzi.io/docs/operators/latest/configuring.html#renewing-your-own-ca-certificates-str, details the following actions:

When I did this I found that the zookeeper and kafka pods failed to roll, saying that they couldn't find a valid CA Certificate.

I found that I needed to have the old CA certificate defined in the CA cert secret as well as the new certificate. This is the documented in the Replacing Private Keys section of the documentation.

Documentation Link

I believe that this section of the documentation, https://strimzi.io/docs/operators/latest/configuring.html#renewing-your-own-ca-certificates-str, needs a step prior to the existing step 3, that says something similar to step 2b of https://strimzi.io/docs/operators/latest/configuring.html#proc-replacing-your-own-private-keys-str

scholzj commented 2 years ago

Are you sure you are doing just a renewal of the public certificate? I.e. no change to the private key?

scholzj commented 2 years ago

CC @ppatierno, you were working on this.

tmitchell10 commented 2 years ago

Hi, yes this was just renewing the latest certificate. What I did was:

1) Install a cluster with my own certs, setting the cert and key annotations to be 0. 2) Update the \<release>-cluster-ca-cert secret, replacing the ca.crt element with the new certificate base64 string, and incrementing the cert-generation annotation to 1.

Then when the first zookeeper pod rolls, it goes into CrashLoopBackOff with the following error in the logs:

Detected Zookeeper ID 3
Preparing truststore
Adding /opt/kafka/cluster-ca-certs/ca.crt to truststore /tmp/zookeeper/cluster.truststore.p12 with alias ca
Certificate was added to keystore
Preparing truststore is complete
Looking for the right CA
No CA found. Thus exiting.
scholzj commented 2 years ago

And what version of Strimzi are you using?

tmitchell10 commented 2 years ago

I was using the 0.28 version of Strimzi

scholzj commented 2 years ago

@ppatierno Can you have a look at this?

ppatierno commented 2 years ago

Yes I am looking into it. I will be back asap ;-)

ppatierno commented 2 years ago

So ... I came back to this and did some investigation and I was able to reproduce your case but not sure if the cause is the same.

In general, when you are NOT changing the private key used to sign the new cluster CA certificate, taking the old one in the Secret (so in the truststore) is not needed because anyway, during the nodes rolling the server certificates are still signed with the same private key which is the same for new cluster CA and old cluster CA. Instead, taking the old cluster CA certificate is important when you change the private key because in the first rolling to add the new cluster CA certificate, the nodes have to trust each other but still using older key and older certificate, because their server certificates are signed with the old private key.

The test I did highlighted the same error you are having but in a specific case.

I generated a self-signed cluster CA with a private key and it was like this.

Issuer: C = IT, L = Italy, O = "Strimzi, Inc.", CN = RootCA v1
        Validity
            Not Before: May 19 13:52:00 2022 GMT
            Not After : May 18 13:52:00 2027 GMT
        Subject: C = IT, L = Italy, O = "Strimzi, Inc.", CN = RootCA v1

I renewed by changing the CN and the new one had:

Issuer: C = IT, L = Italy, O = "Strimzi, Inc.", CN = RootCA v1
        Validity
            Not Before: May 19 14:00:00 2022 GMT
            Not After : May 16 14:00:00 2032 GMT
        Subject: C = IT, L = Italy, O = "Strimzi, Inc.", CN = RootCA v2

The No CA found. Thus exiting. error you are seeing (and that I saw) should be related to the fact that the Issuer is not there anymore, it's not in the truststore. If I renew the certificate making sure that CN doesn't change, everything works fine for me without doing the extra step of taking the old one. Or if I want to change the CN (which is what the Strimzi operator does when you chose its cluster CA not your custom one) I have to make sure that I am generating a new self-signed certificate (so Issuer and Subject are the same) but by using the same private key. Even in this case it works well.

Any further information from your side to try understanding if the problem is the same? Maybe the renewal process is not happening the right way?

scholzj commented 2 years ago

@ppatierno I think this is expected, because it is not a simple renewal. The first certificate is basically a self-signed root CA. The second certificate is not a self-singed root CA. It is signed by the old CA. So it is a bit weird:

But in general I think this is expected to not works as public key renewal. So i guess the question is how exactly your certificates look like @tmitchell10

ppatierno commented 2 years ago

Tbh it's not clear to me from the description if the user case was with a self-signed CA or for a more complex chain with an intermediate in the middle. In the second case I would expect that the intermediate is renewed with the same key but the important thing is having the entire chain in the Secret. I think that one of the CA certs in the chain is missing so the error. So I agree we need more details about the use case and how the certificates look like.

tmitchell10 commented 2 years ago

@ppatierno Thanks for checking this out. You are absolutely right. I had created a custom CA Cert with a private key, and then had generated a new CA cert with the same key, but with a different CN, and the change in CN was causing the verify to fail. I've since regenerated the certs with the same CN and everything works as expected. Apologies for not spotting that.

I don't know if generating a CA cert with a different CN, using the same private key is a valid use case or not, but I found that if I paused the reconciliation, then added the old CA cert plus the new cert with a different CN, to the cluster-ca-cert secret, incremented the ca-cert-generator annotation, and then re-enabled the reconciliation, everything did roll and all the broker/ zk certs were regenerated correctly.

Thanks again for the help with this.

ppatierno commented 2 years ago

@tmitchell10 glad to know that it worked for you!

Regarding taking the old certificate, the rolling worked just for one reason. The servers (zookeeper nodes and Kafka brokers) where using still that old CA for trusting and it was a self signed with issuer and subject having same CN. The other CA was not used at all. As soon as you were going to remove the old CA cert (because expired for example), the cluster would start to raise the error because your new CA had the problem of CN different between issuer and subject.

I don't think that renewing a CA with different CN makes much sense. Renewing is about extending the expiration date, why changing the CN?

ppatierno commented 2 years ago

@tmitchell10 if there are no further comments, do you think we can close this?

scholzj commented 2 years ago

@ppatierno Does it still need some clarification in the docs? To make it more clear when it can be used?

ppatierno commented 2 years ago

when renewing with replacing the private key we have something like this ...

Before going through the following steps, make sure that the CN (Common Name) of the new CA certificate is different from the current one. For example, when the Cluster Operator renews certificates automatically it adds a v suffix to identify a version. Do the same with your own CA certificate by adding a different suffix on each renewal. By using a different key to generate a new CA certificate, you retain the current CA certificate stored in the Secret.

Maybe in the section related to renewal we could add a sentence explaining that as opposite, the CN should not change because renewing when using the same private key is just about making the certificate duration longer but nothing more. I can open a PR for it. Or do you have anything different in mind?

ppatierno commented 2 years ago

@tmitchell10 just to be precise. In my case, the problem about having the issuer different from the subject in my renewed self-signed CA certificate was due to my mistake: using the old CSR for generating the renewed certificate. It drove to have the old issuer (old self-signed certificate) but new subject. In general, changing the CN is not a problem (this is what the Strimzi cluster operator does when handles the CA certificate by itself) but it's important that the chain is consistent and in my case, self-signed cert, issuer and subject has to be the same. We would like to improve the doc from this point of view.

tmitchell10 commented 2 years ago

@ppatierno I am happy for this issue to be closed. Thanks again for the help with this one.

ppatierno commented 2 years ago

Thanks @tmitchell10. I am going to close this. Our plan is to dig into different use cases about CA renewal and make the documentation clearer for the users.