redpanda-data / helm-charts

Redpanda Helm Chart
http://redpanda.com
Apache License 2.0
77 stars 97 forks source link

rotation of `ca.crt` #951

Open marcustut opened 10 months ago

marcustut commented 10 months ago

What happened?

I deployed redpanda onto my Kubernetes cluster on "2023-10-01T17:18:42Z" and these certificates and secrets are being created:

Note that I used SelfSigned Issuer when deploying

$ kubectl get cert
NAME                                 READY   SECRET                               AGE
redpanda-default-cert                True    redpanda-default-cert                83d
redpanda-default-root-certificate    True    redpanda-default-root-certificate    83d
redpanda-external-cert               True    redpanda-external-cert               83d
redpanda-external-root-certificate   True    redpanda-external-root-certificate   83d
$ kubectl get secret
NAME                                 TYPE                       DATA   AGE
redpanda-default-cert                kubernetes.io/tls          3      83d
redpanda-default-root-certificate    kubernetes.io/tls          3      83d
redpanda-external-cert               kubernetes.io/tls          3      83d
redpanda-external-root-certificate   kubernetes.io/tls          3      83d

Since I need to connect to redpanda with TLS, I use the contents in redpanda-default-cert secret for my clients where it has:

However, while tls.crt expires on 2028 (5 years), the ca.crt expires on 2023 December 30th (3 months) but the the redpanda-default-cert's description is as follows:

kind: Certificate
metadata:
  annotations:
    meta.helm.sh/release-name: redpanda
    meta.helm.sh/release-namespace: cybotrade-redpanda
  creationTimestamp: "2023-10-01T17:18:36Z"
  generation: 1
  labels:
    app.kubernetes.io/component: redpanda
    app.kubernetes.io/instance: redpanda
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: redpanda
    helm.sh/chart: redpanda-5.6.17
  name: redpanda-default-cert
  namespace: cybotrade-redpanda
  resourceVersion: "16951563"
  uid: 88ba3450-a3b4-4818-a236-da48e6ac4fb0
spec:
  dnsNames:
  - redpanda-cluster.redpanda.cybotrade-redpanda.svc.cluster.local
  - redpanda-cluster.redpanda.cybotrade-redpanda.svc
  - redpanda-cluster.redpanda.cybotrade-redpanda
  - '*.redpanda-cluster.redpanda.cybotrade-redpanda.svc.cluster.local'
  - '*.redpanda-cluster.redpanda.cybotrade-redpanda.svc'
  - '*.redpanda-cluster.redpanda.cybotrade-redpanda'
  - redpanda.cybotrade-redpanda.svc.cluster.local
  - redpanda.cybotrade-redpanda.svc
  - redpanda.cybotrade-redpanda
  - '*.redpanda.cybotrade-redpanda.svc.cluster.local'
  - '*.redpanda.cybotrade-redpanda.svc'
  - '*.redpanda.cybotrade-redpanda'
  duration: 43800h0m0s
  issuerRef:
    group: cert-manager.io
    kind: Issuer
    name: redpanda-default-root-issuer
  privateKey:
    algorithm: ECDSA
    size: 256
  secretName: redpanda-default-cert
status:
  conditions:
  - lastTransitionTime: "2023-10-01T17:18:42Z"
    message: Certificate is up to date and has not expired
    observedGeneration: 1
    reason: Ready
    status: "True"
    type: Ready
  notAfter: "2028-09-29T17:18:42Z"
  notBefore: "2023-10-01T17:18:42Z"
  renewalTime: "2027-01-30T09:18:42Z"
  revision: 1

meaning that it will only be renewed on 2027, but by then the ca.crt will long be expired.

My question is how do I handle this? Do I need to restart my clients every 3 months to use the renewed ca.crt?

What did you expect to happen?

Since redpanda is using cert-manager, I expect it to renew the certs automatically and I shouldn't need to frequently restart my services every time the cert expires.

How can we reproduce it (as minimally and precisely as possible)?. Please include values file.

```yaml statefulset: replicas: 3 storage: persistentVolume: enabled: true size: 40Gi ```

Anything else we need to know?

No response

Which are the affected charts?

No response

Chart Version(s)

```console $ helm -n list NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION redpanda cybotrade-redpanda 5 2023-11-15 10:53:19.741789 +0000 UTC failed redpanda-5.6.17 v23.2.12 ```

Cloud provider

I am using AWS EKS

JIRA Link: K8S-88

ArturKokoszka commented 10 months ago

I confirm that issue, also same behavior in my environment. Every 3 months I see redpanda pod being unhealthy (readiness probe failed) because of invalid cert. I have to manually recreate both redpanda-default-root-certificate and redpanda-default-cert to make it work.

alejandroEsc commented 10 months ago

Can you send me a quick snippet of your redpanda-default-root-certificate certs durations? There may be something we are not taking into account.

alejandroEsc commented 10 months ago

What version of redpanda chart are you using? I have checked the CAs, both from the root and certs and all are by default set to expire 5 years.

chrisseto commented 9 months ago

This appears to be an unfortunate interaction of cert manager's "CA Bootstrapping" suggestion. See https://github.com/cert-manager/cert-manager/issues/5851 for more details.

You can force cert-manager to update the ca.crt by deleting the secrets it creates. Be warned, there are some security caveats to this approach that are documented in the linked cert-manager issue.

As of helm chart 5.6.5, the CA created for the issuer will be valid for 5 years.

ArturKokoszka commented 8 months ago

This problem still occurs. Is there anything that can be done on redpandas side or is it cert-manager-only issue?

chrisseto commented 7 months ago

@ArturKokoszka Updating to the newest version of the helm chart (and then forcing cert-manager to update the ca.crt) will increase the life time to 5 years.

The inability to perform an automatic rotation is a limitation of the "CA Bootstrapping" method within cert-manager but we did opt to deploy it that way.

Changing the default to something friendlier may have some unfortunate ramifications for existing installations so we'll need to think that through thoroughly. I will say, it's difficult to have a default TLS solution that "just works". Every option comes with some degree of caveats. To you and the users that 👍 'd your comment, what behavior are you most interested in? Do you need TLS to work and be secure by default or do you just need TLS to exist and never break in the default installation?

carlreid commented 7 months ago

@chrisseto In our case, we're running into renewal issues on certificates where there seems to be some date misalignment. I've not done any thorough investigation and just thought that this issue may be somewhat related.

To add some context, we seem to get this odd situation: kubectl describe certificate kafka-default-cert outputs:

Not After:               2024-05-13T14:31:04Z
Not Before:              2024-02-13T14:31:04Z
Renewal Time:            2024-04-13T14:31:04Z

So that looks fine, but the actual certificate: kubectl get secret kafka-default-cert -o jsonpath={.data."ca\.crt"} | base64 --decode | openssl x509 -text -noout outputs:

Validity
    Not Before: Jan 14 09:19:15 2024 GMT
    Not After : Apr 13 09:19:15 2024 GMT

As shown above the Renewal Time is after the certificates Not After, so it means it can't be renewed. We currently need to do a cmctl renew kafka-default-cert to fix this when it happens.

This could be a misconfiguration our side, but haven't done much diving into this issue yet. Anyhow, here's the Redpanda values / settings we have in relation to certs:

  tls:
    enabled: true
    certs:
      default:
        caEnabled: true
        duration: 2160h
ArturKokoszka commented 7 months ago

@chrisseto I'm still facing the same issue on both of my enviromnents. I just had to delete both secrets and it's working again. Replying to your questions, we definetely need a TLS solution that is stable and never break.