knative / operator

Combined operator for Knative.
Apache License 2.0
187 stars 99 forks source link

`kn-routing` certificate gets stuck in a loop when using `net-certmanager` #1685

Closed wSedlacek closed 4 months ago

wSedlacek commented 8 months ago

Describe the bug When applying https://github.com/knative/net-certmanager/releases/download/knative-v1.13.0/release.yaml it tries to create a certificate (self-signed) for kn-routing but gets stuck in a loop.

{"severity":"INFO","timestamp":"2024-01-30T15:45:26.750709521Z","logger":"net-certmanager-controller.certificate-controller","caller":"certificate/certificate.go:101","message":"Reconciling Cert-Manager certificate for Knative cert.","commit":"7397714-dirty","knative.dev/controller":"certificate-controller","knative.dev/controller":"knative.dev.net-certmanager.pkg.reconciler.certificate.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Certificate","knative.dev/traceid":"699031f6-fc3a-48da-981a-53e65cb13bff","knative.dev/key":"knative-serving/routing-serving-certs"}
{"severity":"INFO","timestamp":"2024-01-30T15:45:26.750803246Z","logger":"net-certmanager-controller.certificate-controller","caller":"certificate/certificate.go:121","message":"cm cert condition &{Ready False 2024-01-30 15:45:26 +0000 UTC SecretMismatch Secret contains a private key that does not match the current CertificateRequest 5}.","commit":"7397714-dirty","knative.dev/controller":"certificate-controller","knative.dev/controller":"knative.dev.net-certmanager.pkg.reconciler.certificate.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Certificate","knative.dev/traceid":"699031f6-fc3a-48da-981a-53e65cb13bff","knative.dev/key":"knative-serving/routing-serving-certs"}
{"severity":"INFO","timestamp":"2024-01-30T15:45:26.75087679Z","logger":"net-certmanager-controller.certificate-controller","caller":"controller/controller.go:550","message":"Reconcile succeeded","commit":"7397714-dirty","knative.dev/controller":"certificate-controller","knative.dev/controller":"knative.dev.net-certmanager.pkg.reconciler.certificate.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Certificate","knative.dev/traceid":"699031f6-fc3a-48da-981a-53e65cb13bff","knative.dev/key":"knative-serving/routing-serving-certs","duration":"204.27µs"}
{"severity":"INFO","timestamp":"2024-01-30T15:45:27.329864665Z","logger":"net-certmanager-controller.certificate-controller","caller":"certificate/certificate.go:101","message":"Reconciling Cert-Manager certificate for Knative cert.","commit":"7397714-dirty","knative.dev/controller":"certificate-controller","knative.dev/controller":"knative.dev.net-certmanager.pkg.reconciler.certificate.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Certificate","knative.dev/traceid":"b88fe0bc-7c10-49c8-b378-b572fbc93bb5","knative.dev/key":"knative-serving/routing-serving-certs"}
{"severity":"INFO","timestamp":"2024-01-30T15:45:27.330293699Z","logger":"net-certmanager-controller.certificate-controller","caller":"certificate/certificate.go:121","message":"cm cert condition &{Ready False 2024-01-30 15:45:26 +0000 UTC SecretMismatch Secret contains a private key that does not match the current CertificateRequest 5}.","commit":"7397714-dirty","knative.dev/controller":"certificate-controller","knative.dev/controller":"knative.dev.net-certmanager.pkg.reconciler.certificate.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Certificate","knative.dev/traceid":"b88fe0bc-7c10-49c8-b378-b572fbc93bb5","knative.dev/key":"knative-serving/routing-serving-certs"}
{"severity":"INFO","timestamp":"2024-01-30T15:45:27.330488473Z","logger":"net-certmanager-controller.certificate-controller","caller":"controller/controller.go:550","message":"Reconcile succeeded","commit":"7397714-dirty","knative.dev/controller":"certificate-controller","knative.dev/controller":"knative.dev.net-certmanager.pkg.reconciler.certificate.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Certificate","knative.dev/traceid":"b88fe0bc-7c10-49c8-b378-b572fbc93bb5","knative.dev/key":"knative-serving/routing-serving-certs","duration":"680.72µs"}
{"severity":"INFO","timestamp":"2024-01-30T15:45:27.478490655Z","logger":"net-certmanager-controller.certificate-controller","caller":"certificate/certificate.go:101","message":"Reconciling Cert-Manager certificate for Knative cert.","commit":"7397714-dirty","knative.dev/controller":"certificate-controller","knative.dev/controller":"knative.dev.net-certmanager.pkg.reconciler.certificate.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Certificate","knative.dev/traceid":"a3bccab7-6cf8-45ab-988d-991ef191c9c7","knative.dev/key":"knative-serving/routing-serving-certs"}
{"severity":"INFO","timestamp":"2024-01-30T15:45:27.478636258Z","logger":"net-certmanager-controller.certificate-controller","caller":"certificate/certificate.go:121","message":"cm cert condition &{Ready True 2024-01-30 15:45:27 +0000 UTC Ready Certificate is up to date and has not expired 5}.","commit":"7397714-dirty","knative.dev/controller":"certificate-controller","knative.dev/controller":"knative.dev.net-certmanager.pkg.reconciler.certificate.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Certificate","knative.dev/traceid":"a3bccab7-6cf8-45ab-988d-991ef191c9c7","knative.dev/key":"knative-serving/routing-serving-certs"}
{"severity":"INFO","timestamp":"2024-01-30T15:45:27.495933495Z","logger":"net-certmanager-controller.certificate-controller","caller":"controller/controller.go:550","message":"Reconcile succeeded","commit":"7397714-dirty","knative.dev/controller":"certificate-controller","knative.dev/controller":"knative.dev.net-certmanager.pkg.reconciler.certificate.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Certificate","knative.dev/traceid":"a3bccab7-6cf8-45ab-988d-991ef191c9c7","knative.dev/key":"knative-serving/routing-serving-certs","duration":"17.493734ms"}
{"severity":"INFO","timestamp":"2024-01-30T15:45:27.49603057Z","logger":"net-certmanager-controller.certificate-controller","caller":"certificate/certificate.go:101","message":"Reconciling Cert-Manager certificate for Knative cert.","commit":"7397714-dirty","knative.dev/controller":"certificate-controller","knative.dev/controller":"knative.dev.net-certmanager.pkg.reconciler.certificate.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Certificate","knative.dev/traceid":"3b2cda54-308c-4c18-9284-64bb8199b6db","knative.dev/key":"knative-serving/routing-serving-certs"}
{"severity":"INFO","timestamp":"2024-01-30T15:45:27.496087213Z","logger":"net-certmanager-controller.certificate-controller","caller":"certificate/certificate.go:121","message":"cm cert condition &{Ready True 2024-01-30 15:45:27 +0000 UTC Ready Certificate is up to date and has not expired 5}.","commit":"7397714-dirty","knative.dev/controller":"certificate-controller","knative.dev/controller":"knative.dev.net-certmanager.pkg.reconciler.certificate.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Certificate","knative.dev/traceid":"3b2cda54-308c-4c18-9284-64bb8199b6db","knative.dev/key":"knative-serving/routing-serving-certs"}
{"severity":"INFO","timestamp":"2024-01-30T15:45:27.496119918Z","logger":"net-certmanager-controller.certificate-controller","caller":"controller/controller.go:550","message":"Reconcile succeeded","commit":"7397714-dirty","knative.dev/controller":"certificate-controller","knative.dev/controller":"knative.dev.net-certmanager.pkg.reconciler.certificate.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Certificate","knative.dev/traceid":"3b2cda54-308c-4c18-9284-64bb8199b6db","knative.dev/key":"knative-serving/routing-serving-certs","duration":"122.507µs"}
{"severity":"INFO","timestamp":"2024-01-30T15:45:27.735175098Z","logger":"net-certmanager-controller.certificate-controller","caller":"certificate/certificate.go:101","message":"Reconciling Cert-Manager certificate for Knative cert.","commit":"7397714-dirty","knative.dev/controller":"certificate-controller","knative.dev/controller":"knative.dev.net-certmanager.pkg.reconciler.certificate.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Certificate","knative.dev/traceid":"387bfa75-95d9-4598-b772-06278907dbe0","knative.dev/key":"knative-serving/routing-serving-certs"}
{"severity":"INFO","timestamp":"2024-01-30T15:45:27.735511007Z","logger":"net-certmanager-controller.certificate-controller","caller":"certificate/certificate.go:121","message":"cm cert condition &{Ready True 2024-01-30 15:45:27 +0000 UTC Ready Certificate is up to date and has not expired 5}.","commit":"7397714-dirty","knative.dev/controller":"certificate-controller","knative.dev/controller":"knative.dev.net-certmanager.pkg.reconciler.certificate.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Certificate","knative.dev/traceid":"387bfa75-95d9-4598-b772-06278907dbe0","knative.dev/key":"knative-serving/routing-serving-certs"}
{"severity":"INFO","timestamp":"2024-01-30T15:45:27.735672454Z","logger":"net-certmanager-controller.certificate-controller","caller":"controller/controller.go:550","message":"Reconcile succeeded","commit":"7397714-dirty","knative.dev/controller":"certificate-controller","knative.dev/controller":"knative.dev.net-certmanager.pkg.reconciler.certificate.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Certificate","knative.dev/traceid":"387bfa75-95d9-4598-b772-06278907dbe0","knative.dev/key":"knative-serving/routing-serving-certs","duration":"547.577µs"}
{"severity":"INFO","timestamp":"2024-01-30T15:45:27.930926489Z","logger":"net-certmanager-controller.certificate-controller","caller":"certificate/certificate.go:101","message":"Reconciling Cert-Manager certificate for Knative cert.","commit":"7397714-dirty","knative.dev/controller":"certificate-controller","knative.dev/controller":"knative.dev.net-certmanager.pkg.reconciler.certificate.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Certificate","knative.dev/traceid":"52ac6074-1988-4b4d-8f4c-de4821dc8dd7","knative.dev/key":"knative-serving/routing-serving-certs"}
{"severity":"INFO","timestamp":"2024-01-30T15:45:27.931008317Z","logger":"net-certmanager-controller.certificate-controller","caller":"certificate/certificate.go:121","message":"cm cert condition &{Ready False 2024-01-30 15:45:27 +0000 UTC SecretMismatch Secret contains a private key that does not match the current CertificateRequest 5}.","commit":"7397714-dirty","knative.dev/controller":"certificate-controller","knative.dev/controller":"knative.dev.net-certmanager.pkg.reconciler.certificate.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Certificate","knative.dev/traceid":"52ac6074-1988-4b4d-8f4c-de4821dc8dd7","knative.dev/key":"knative-serving/routing-serving-certs"}
{"severity":"INFO","timestamp":"2024-01-30T15:45:27.951502254Z","logger":"net-certmanager-controller.certificate-controller","caller":"controller/controller.go:550","message":"Reconcile succeeded","commit":"7397714-dirty","knative.dev/controller":"certificate-controller","knative.dev/controller":"knative.dev.net-certmanager.pkg.reconciler.certificate.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Certificate","knative.dev/traceid":"52ac6074-1988-4b4d-8f4c-de4821dc8dd7","knative.dev/key":"knative-serving/routing-serving-certs","duration":"20.635514ms"}
{"severity":"INFO","timestamp":"2024-01-30T15:45:27.951610696Z","logger":"net-certmanager-controller.certificate-controller","caller":"certificate/certificate.go:101","message":"Reconciling Cert-Manager certificate for Knative cert.","commit":"7397714-dirty","knative.dev/controller":"certificate-controller","knative.dev/controller":"knative.dev.net-certmanager.pkg.reconciler.certificate.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Certificate","knative.dev/traceid":"dc85d0e3-7a32-42ed-a501-661cc8ec08d5","knative.dev/key":"knative-serving/routing-serving-certs"}
{"severity":"INFO","timestamp":"2024-01-30T15:45:27.951675575Z","logger":"net-certmanager-controller.certificate-controller","caller":"certificate/certificate.go:121","message":"cm cert condition &{Ready False 2024-01-30 15:45:27 +0000 UTC SecretMismatch Secret contains a private key that does not match the current CertificateRequest 5}.","commit":"7397714-dirty","knative.dev/controller":"certificate-controller","knative.dev/controller":"knative.dev.net-certmanager.pkg.reconciler.certificate.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Certificate","knative.dev/traceid":"dc85d0e3-7a32-42ed-a501-661cc8ec08d5","knative.dev/key":"knative-serving/routing-serving-certs"}
{"severity":"INFO","timestamp":"2024-01-30T15:45:27.951717457Z","logger":"net-certmanager-controller.certificate-controller","caller":"controller/controller.go:550","message":"Reconcile succeeded","commit":"7397714-dirty","knative.dev/controller":"certificate-controller","knative.dev/controller":"knative.dev.net-certmanager.pkg.reconciler.certificate.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Certificate","knative.dev/traceid":"dc85d0e3-7a32-42ed-a501-661cc8ec08d5","knative.dev/key":"knative-serving/routing-serving-certs","duration":"146.065µs"}

Specifically SecretMismatch Secret contains a private key that does not match the current CertificateRequest

Expected behavior

I am not entirely sure why it is getting stuck in a loop or if the place to change it is in net-certmanager however kn-routing seems to have been introduced in 1.13 in net-istio so this seemed like the place to start.

Knative release version 1.13.0

ReToCode commented 8 months ago

I'm assuming you have enabled some of the experimental encryption features? In 1.13 certificates are created by cert-manager (via net-certmanager) instead of an internal reconciler. Can you try to delete the existing secret: knative-serving/routing-serving-certs so it will be re-populated by a Secret from cert-manager?

The one from net-istio should be fine as it is newly introduced.

wSedlacek commented 8 months ago

I'm assuming you have enabled some of the experimental encryption features?

As far as I know I haven't done this.

To be verbose on the complete setup I have a pulumi build script setup that

  1. Installs prometheus from the kube-prometheus-stack helm chart
  2. Installs istio from the base, istiod, gateway helm charts
  3. Applies the istio addon for prometheus service monitors from github yaml
  4. Applies istio addons for jaeger and kiali (adjusting the config map to point at the prometheus-operatorated) from github yaml
  5. Applies envoy filters for web grpc and https redirect (has a special case for acme challenge so that it doesn't redirect) from local yaml
  6. Applies the CloudNativePG and Redis Operators from github yaml
  7. Applies the Knative operator from GitHub yaml (uses transforms to place in knative-system namespace)
  8. Installs cert-manager from helm with prometheus service monitor
  9. Applies Google Trust Services issuer for http acme challenges from local yaml
  10. Applies KnativeServing and KnativeEventing operators from local yaml
  11. Applies net-certmaanger from github yaml with a transform to set the issuerRef to the Google Trust Services ClusterIssuer I applied previously and the systemInternalIssuerRef to the knative-selfsigned-issuer as shown in the _example
  12. Applies HNC Controller from github yaml

I am doing this on a GKE standard cluster that was brand new for testing this build script. I have torn it down and reapplied it several times. I get pretty consistent results. On knative 1.12.0 I don't get the loop of certificates trying to be made, but on knative 1.13.0 I do.

Here are my configurations for KnativeServing and KnativeEventing

apiVersion: operator.knative.dev/v1beta1
kind: KnativeEventing
metadata:
  name: knative-eventing
  namespace: knative-eventing
spec:
  version: {{.Values.version}}
  source:
    redis:
      enabled: true
---
apiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
metadata:
  name: knative-serving
  namespace: knative-serving
spec:
  version: {{.Values.version}}
  config:
    network:
      external-domain-tls: {{.Values.https | ternary "Enabled" "Disabled"}}
    domain:
      'example.com': ''

Can you try to delete the existing secret: knative-serving/routing-serving-certs so it will be re-populated by a Secret from cert-manager?

I do see this secret as already existing, however it is listed as controlled by the KnativeServing operator. I can manually delete it and it seems not not be immediately recreated by the operator. It does allow the certificate to be issued after that which of course recreates the secret by controlled by the Certificate. I suspect though when making a change to the KnativeServing operator like updating the version it would reapply this secret and bring the issue back.

So I think the root of the problem is that both the KnativeServing operator and the cert-manager Certificate are trying to control the same secret.

For my build script this is also problematic as both the Secret controlled by KnativeServing and the Certificate are not directly created from my build script so I would need to wait for that Secret to exists from KnativeServing before I can manually delete it for the Certificate to get issued.

Is there a way to make the KnativeServing operator not create routing-serving-certs when using net-certmaanger? Is there a way to make the certificate created by the KnativeServing operator able to be overwritten by the one created by cert-manager without getting stuck in a loop?

ReToCode commented 8 months ago

I'm still not sure why you are facing this. Let me point to the important parts in code:

In release-1.13 you should have:

So something is off here, as your error says:

Reconciling Cert-Manager certificate for Knative cert.
knative.dev/kind networking.internal.knative.dev.Certificate
knative.dev/key: knative-serving/routing-serving-certs

So I think the root of the problem is that both the KnativeServing operator and the cert-manager Certificate are trying to control the same secret.

In 1.14 we will have a KnativeCertificate to replace the Secret, but not in 1.13. So there should not be a conflict. Can you post the following outputs?

kubectl get kcert -A
kubectl get certificate -A

kubectl get secret -n knative-serving
kubectl get secret -n istio-system (or your istio system namespace if you override it)

Also, do you use the default istio-system namespace or did you change it?

wSedlacek commented 8 months ago

Also, do you use the default istio-system namespace or did you change it?

Yes I have istio-system as my namespace for istio AND the istio-ingressgateway

Can you post the following outputs?

kubectl get kcert -A

NAMESPACE         NAME                    READY   REASON
knative-serving   routing-serving-certs   False   SecretMismatch

Note: routing-serving-certs cycles between Ready True and False

kubectl get certificate -A

NAMESPACE         NAME                    READY   SECRET                  AGE
cert-manager      knative-selfsigned-ca   True    knative-selfsigned-ca   98s
knative-serving   routing-serving-certs   False   routing-serving-certs   92s

Note: routing-serving-certs cycles between Ready True and False

kubectl get secret -n knative-serving

NAME                            TYPE     DATA   AGE
knative-serving-certs           Opaque   6      2m46s
net-certmanager-webhook-certs   Opaque   3      2m27s
net-istio-webhook-certs         Opaque   3      2m39s
routing-serving-certs           Opaque   6      2m46s
routing-serving-certs-dwgxb     Opaque   1      0s
serving-certs-ctrl-ca           Opaque   4      2m47s
webhook-certs                   Opaque   3      2m41

kubectl get secret -n istio-system

NAME              TYPE               DATA   AGE
istio-ca-secret   istio.io/ca-root   5      22m
ReToCode commented 8 months ago

As above , you should not have a kcert in namespace knative-serving only a Secret (see our release manifests). But you should have that one istio-system which you do not. So something is off with your installation YAML. Can you check the manifest that your installation produces and see where that comes from?

It looks like this file is getting placed in knative-serving somehow.

wSedlacek commented 7 months ago

I have been a bit busy with other projects but I am finally getting back to this. Could it be that this function is being applied incorrectly when the operator is used to install knative? https://github.com/knative/operator/blob/d6ee177ea432bead00717544aedde5ead10680a8/pkg/reconciler/knativeserving/knativeserving.go#L150-L154

It looks like the extra argument gives extra behavior for the ingress, does that need a new entry added to handle the kcert so that it gets put in the right namespace?

wSedlacek commented 4 months ago

This bug does not occur in 1.14.0 as expected, and the issue with net-certmanager is completely removed with 1.14.1 because it is now integrated into serving. All good now! Thank you!