knative / eventing

Event-driven application platform for Kubernetes
https://knative.dev/docs/eventing
Apache License 2.0
1.41k stars 588 forks source link

Eventing webhook fails to start / stuck in crash loop #7885

Closed maylukas closed 2 months ago

maylukas commented 5 months ago

Describe the bug Clean installation using the operator fails. The eventing-webhook is in a crash loop.

Logs of the eventing webhook:

2024/05/02 10:43:54 Registering 5 informer factories
2024/05/02 10:43:54 Registering 7 informers
2024/05/02 10:43:54 Registering 7 controllers
{"level":"info","ts":"2024-05-02T10:43:55.083Z","logger":"eventing-webhook","caller":"profiling/server.go:65","msg":"Profiling enabled: false","commit":"e23ebab","knative.dev/pod":"eventing-webhook-5874bb8445-rz6t7"}
{"level":"info","ts":"2024-05-02T10:43:55.112Z","logger":"eventing-webhook","caller":"leaderelection/context.go:47","msg":"Running with Standard leader election","commit":"e23ebab","knative.dev/pod":"eventing-webhook-5874bb8445-rz6t7"}
{"level":"info","ts":"2024-05-02T10:43:55.139Z","logger":"eventing-webhook","caller":"sinkbinding/controller.go:194","msg":"Starting global resync of SinkBindings every 30m0s","commit":"e23ebab","knative.dev/pod":"eventing-webhook-5874bb8445-rz6t7"}
{"level":"info","ts":"2024-05-02T10:43:55.176Z","logger":"eventing-webhook","caller":"sharedmain/main.go:283","msg":"Starting configuration manager...","commit":"e23ebab","knative.dev/pod":"eventing-webhook-5874bb8445-rz6t7"}
{"level":"info","ts":"2024-05-02T10:43:55.264Z","logger":"eventing-webhook","caller":"sinkbinding/controller.go:89","msg":"feature config changed. name: config-features, value: map[authentication-oidc:Disabled cross-namespace-event-links:Disabled delivery-retryafter:Disabled delivery-timeout:Enabled eventtype-auto-create:Disabled kreference-group:Disabled kreference-mapping:Disabled new-trigger-filters:Enabled transport-encryption:Disabled]","commit":"e23ebab","knative.dev/pod":"eventing-webhook-5874bb8445-rz6t7"}
{"level":"info","ts":1714646635.2782583,"logger":"fallback","caller":"injection/injection.go:63","msg":"Starting informers..."}
{"level":"warn","ts":"2024-05-02T10:43:55.778Z","logger":"eventing-webhook","caller":"webhook/webhook.go:197","msg":"server key missing","commit":"e23ebab","knative.dev/pod":"eventing-webhook-5874bb8445-rz6t7"}
{"level":"error","ts":"2024-05-02T10:43:55.778Z","logger":"eventing-webhook","caller":"webhook/webhook.go:248","msg":"http: TLS handshake error from 51.75.198.249:39398: tls: no certificates configured\n","commit":"e23ebab","knative.dev/pod":"eventing-webhook-5874bb8445-rz6t7","stacktrace":"knative.dev/pkg/webhook.(*zapWrapper).Write\n\tknative.dev/pkg@v0.0.0-20240416145024-0f34a8815650/webhook/webhook.go:248\nlog.(*Logger).output\n\tlog/log.go:245\nlog.(*Logger).Printf\n\tlog/log.go:268\nnet/http.(*Server).logf\n\tnet/http/server.go:3411\nnet/http.(*conn).serve\n\tnet/http/server.go:1930"}
{"level":"warn","ts":"2024-05-02T10:43:55.863Z","logger":"eventing-webhook","caller":"webhook/webhook.go:197","msg":"server key missing","commit":"e23ebab","knative.dev/pod":"eventing-webhook-5874bb8445-rz6t7"}
{"level":"error","ts":"2024-05-02T10:43:55.863Z","logger":"eventing-webhook","caller":"webhook/webhook.go:248","msg":"http: TLS handshake error from 51.75.198.249:39410: tls: no certificates configured\n","commit":"e23ebab","knative.dev/pod":"eventing-webhook-5874bb8445-rz6t7","stacktrace":"knative.dev/pkg/webhook.(*zapWrapper).Write\n\tknative.dev/pkg@v0.0.0-20240416145024-0f34a8815650/webhook/webhook.go:248\nlog.(*Logger).output\n\tlog/log.go:245\nlog.(*Logger).Printf\n\tlog/log.go:268\nnet/http.(*Server).logf\n\tnet/http/server.go:3411\nnet/http.(*conn).serve\n\tnet/http/server.go:1930"}
{"level":"warn","ts":"2024-05-02T10:43:56.863Z","logger":"eventing-webhook","caller":"webhook/webhook.go:197","msg":"server key missing","commit":"e23ebab","knative.dev/pod":"eventing-webhook-5874bb8445-rz6t7"}
{"level":"error","ts":"2024-05-02T10:43:56.863Z","logger":"eventing-webhook","caller":"webhook/webhook.go:248","msg":"http: TLS handshake error from 51.75.198.249:47560: tls: no certificates configured\n","commit":"e23ebab","knative.dev/pod":"eventing-webhook-5874bb8445-rz6t7","stacktrace":"knative.dev/pkg/webhook.(*zapWrapper).Write\n\tknative.dev/pkg@v0.0.0-20240416145024-0f34a8815650/webhook/webhook.go:248\nlog.(*Logger).output\n\tlog/log.go:245\nlog.(*Logger).Printf\n\tlog/log.go:268\nnet/http.(*Server).logf\n\tnet/http/server.go:3411\nnet/http.(*conn).serve\n\tnet/http/server.go:1930"}
{"level":"warn","ts":"2024-05-02T10:43:57.840Z","logger":"eventing-webhook","caller":"webhook/webhook.go:197","msg":"server key missing","commit":"e23ebab","knative.dev/pod":"eventing-webhook-5874bb8445-rz6t7"}

Knative Eventing Resource

apiVersion: operator.knative.dev/v1beta1
kind: KnativeEventing
metadata:
  name: knative-eventing
  namespace: knative-eventing
spec:
  source:
    rabbitmq:
      enabled: true
  version: 1.14.0

Expected behavior Installation of Knative Eventing should be successful

To Reproduce Installation of cert-manager (1.14.5) Installation of trust-manager (0.7.1) Installation of istio (1.21.2) Installation of Knative operator (1.14.0) Installation of Knative Serving (1.14.0) Installation of Knative Eventing (1.14.0)

Knative release version 1.14.0

Additional context Add any other context about the problem here such as proposed priority

maylukas commented 5 months ago

We're also seeing issues with the "routing-serving-certs" issuance: Failed to wait for order resource "routing-serving-certs-1-422265175" to become ready: order is in "errored" state: Failed to create Order: 400 urn:ietf:params:acme:error:rejectedIdentifier: Error creating new order :: Cannot issue for "kn-routing": Domain name needs at least one dot

Cali0707 commented 4 months ago

cc @pierDipi

pierDipi commented 4 months ago

I think these comments are relevant here https://github.com/knative/pkg/issues/2560#issuecomment-1195840564 and https://github.com/knative/pkg/issues/2560#issuecomment-1195842825, in particular these parts

I'm curious what cert is the webhook presenting and see what's defined in your CA bundle of the configured webhook (ie. ValidatingWebhookConfiguration and MutatingWebhookConfiguration)

and

The typical misconfiguration we see is if the liveness probe timeout of the webhook deployment is too low - it never gets a chance to become the leader and create the certificate. This is because K8s kills the container.

maylukas commented 2 months ago

We could solve this issue by increasing the memory limits & requests