knative / serving

Kubernetes-based, scale-to-zero, request-driven compute
https://knative.dev/docs/serving/
Apache License 2.0
5.56k stars 1.16k forks source link

net-istio-controller crashes with error "Failed to start informers" - "failed to wait for cache at index 8 to sync" #15441

Open maylukas opened 3 months ago

maylukas commented 3 months ago

In what area(s)?

/area networking

What version of Knative?

Knative Serving: 1.15.0 Istio: 1.22.3

Expected Behavior

net-istio-controller should start

Actual Behavior

net-istio-controller fails to start and emits the following console output (I have replaced the actual service names with service(1/2/3):

2024/08/02 07:33:20 Registering 3 clients
2024/08/02 07:33:20 Registering 5 informer factories
2024/08/02 07:33:20 Registering 8 informers
2024/08/02 07:33:20 Registering 2 controllers
{"severity":"DEBUG","timestamp":"2024-08-02T07:33:20.993254057Z","caller":"logging/config.go:116","message":"Successfully created the logger."}
{"severity":"DEBUG","timestamp":"2024-08-02T07:33:20.993411819Z","caller":"logging/config.go:117","message":"Logging level set to: debug"}
{"severity":"INFO","timestamp":"2024-08-02T07:33:20.993728295Z","logger":"net-istio-controller","caller":"profiling/server.go:65","message":"Profiling enabled: false","commit":"a8bc624-dirty"}
{"severity":"INFO","timestamp":"2024-08-02T07:33:21.006166486Z","logger":"net-istio-controller","caller":"leaderelection/context.go:47","message":"Running with Standard leader election","commit":"a8bc624-dirty"}
{"severity":"DEBUG","timestamp":"2024-08-02T07:33:21.018980398Z","logger":"net-istio-controller","caller":"metrics/metrics_worker.go:76","message":"Flushing the existing exporter before setting up the new exporter.","commit":"a8bc624-dirty"}
{"severity":"DEBUG","timestamp":"2024-08-02T07:33:21.019300231Z","logger":"net-istio-controller","caller":"metrics/prometheus_exporter.go:51","message":"Created Prometheus exporter with config: &{knative.dev/net-istio net_istio_controller prometheus 5000000000 <nil>  false 9090 0.0.0.0}. Start the server for Prometheus exporter.","commit":"a8bc624-dirty"}
{"severity":"DEBUG","timestamp":"2024-08-02T07:33:21.019353212Z","logger":"net-istio-controller","caller":"metrics/metrics_worker.go:91","message":"Successfully updated the metrics exporter; old config: <nil>; new config &{knative.dev/net-istio net_istio_controller prometheus 5000000000 <nil>  false 9090 0.0.0.0}","commit":"a8bc624-dirty"}
{"severity":"DEBUG","timestamp":"2024-08-02T07:33:21.019659948Z","logger":"net-istio-controller.istio-ingress-controller","caller":"ingress/controller.go:153","message":"Creating event broadcaster","commit":"a8bc624-dirty","knative.dev/controller":"istio-ingress-controller"}
{"severity":"DEBUG","timestamp":"2024-08-02T07:33:21.020328198Z","logger":"net-istio-controller","caller":"serverlessservice/controller.go:149","message":"Creating event broadcaster","commit":"a8bc624-dirty"}
{"severity":"INFO","timestamp":"2024-08-02T07:33:21.036907198Z","logger":"net-istio-controller","caller":"sharedmain/main.go:282","message":"Starting configuration manager...","commit":"a8bc624-dirty"}
{"severity":"DEBUG","timestamp":"2024-08-02T07:33:21.127445355Z","logger":"net-istio-controller.istio-ingress-controller.config-store","caller":"configmap/store.go:155","message":"ingress config \"config-istio\" config was added or updated: &config.Istio{IngressGateways:[]config.Gateway{config.Gateway{Namespace:\"knative-serving\", Name:\"knative-ingress-gateway\", ServiceURL:\"istio-ingressgateway.istio-system.svc.cluster.local\", LabelSelector:(*v1.LabelSelector)(nil)}}, LocalGateways:[]config.Gateway{config.Gateway{Namespace:\"knative-serving\", Name:\"knative-local-gateway\", ServiceURL:\"knative-local-gateway.istio-system.svc.cluster.local\", LabelSelector:(*v1.LabelSelector)(nil)}}}","commit":"a8bc624-dirty","knative.dev/controller":"istio-ingress-controller"}
{"severity":"DEBUG","timestamp":"2024-08-02T07:33:21.127627405Z","logger":"net-istio-controller.config-store","caller":"configmap/store.go:155","message":"ingress config \"config-istio\" config was added or updated: &config.Istio{IngressGateways:[]config.Gateway{config.Gateway{Namespace:\"knative-serving\", Name:\"knative-ingress-gateway\", ServiceURL:\"istio-ingressgateway.istio-system.svc.cluster.local\", LabelSelector:(*v1.LabelSelector)(nil)}}, LocalGateways:[]config.Gateway{config.Gateway{Namespace:\"knative-serving\", Name:\"knative-local-gateway\", ServiceURL:\"knative-local-gateway.istio-system.svc.cluster.local\", LabelSelector:(*v1.LabelSelector)(nil)}}}","commit":"a8bc624-dirty"}
{"severity":"DEBUG","timestamp":"2024-08-02T07:33:21.128349559Z","logger":"net-istio-controller.istio-ingress-controller.config-store","caller":"configmap/store.go:155","message":"ingress config \"config-network\" config was added or updated: &config.Config{DefaultIngressClass:\"istio.ingress.networking.knative.dev\", DomainTemplate:\"{{.Name}}.{{.Domain}}\", TagTemplate:\"{{.Tag}}-{{.Name}}\", AutoTLS:true, ExternalDomainTLS:true, HTTPProtocol:\"redirected\", DefaultCertificateClass:\"cert-manager.certificate.networking.knative.dev\", NamespaceWildcardCertSelector:(*v1.LabelSelector)(0xc000501000), RolloutDurationSecs:0, AutocreateClusterDomainClaims:false, EnableMeshPodAddressability:false, MeshCompatibilityMode:\"auto\", DefaultExternalScheme:\"http\", InternalEncryption:false, SystemInternalTLS:\"disabled\", ClusterLocalDomainTLS:\"disabled\"}","commit":"a8bc624-dirty","knative.dev/controller":"istio-ingress-controller"}
{"severity":"DEBUG","timestamp":"2024-08-02T07:33:21.130612751Z","logger":"net-istio-controller.config-store","caller":"configmap/store.go:155","message":"ingress config \"config-network\" config was added or updated: &config.Config{DefaultIngressClass:\"istio.ingress.networking.knative.dev\", DomainTemplate:\"{{.Name}}.{{.Domain}}\", TagTemplate:\"{{.Tag}}-{{.Name}}\", AutoTLS:true, ExternalDomainTLS:true, HTTPProtocol:\"redirected\", DefaultCertificateClass:\"cert-manager.certificate.networking.knative.dev\", NamespaceWildcardCertSelector:(*v1.LabelSelector)(0xc000501100), RolloutDurationSecs:0, AutocreateClusterDomainClaims:false, EnableMeshPodAddressability:false, MeshCompatibilityMode:\"auto\", DefaultExternalScheme:\"http\", InternalEncryption:false, SystemInternalTLS:\"disabled\", ClusterLocalDomainTLS:\"disabled\"}","commit":"a8bc624-dirty"}
{"level":"info","ts":1722584001.137617,"logger":"fallback","caller":"injection/injection.go:63","msg":"Starting informers..."}
{"severity":"DEBUG","timestamp":"2024-08-02T07:33:21.219211362Z","logger":"net-istio-controller.istio-ingress-controller","caller":"controller/controller.go:418","message":"Adding to queue functions/service1 (depth: 1)","commit":"a8bc624-dirty","knative.dev/controller":"istio-ingress-controller","knative.dev/controller":"knative.dev.net-istio.pkg.reconciler.ingress.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Ingress","knative.dev/key":"functions/service1"}
{"severity":"DEBUG","timestamp":"2024-08-02T07:33:21.219268903Z","logger":"net-istio-controller.istio-ingress-controller","caller":"controller/controller.go:418","message":"Adding to queue functions/service1 (depth: 1)","commit":"a8bc624-dirty","knative.dev/controller":"istio-ingress-controller","knative.dev/controller":"knative.dev.net-istio.pkg.reconciler.ingress.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Ingress","knative.dev/key":"functions/service1"}
{"severity":"DEBUG","timestamp":"2024-08-02T07:33:21.21930356Z","logger":"net-istio-controller.istio-ingress-controller","caller":"controller/controller.go:418","message":"Adding to queue functions/service2 (depth: 2)","commit":"a8bc624-dirty","knative.dev/controller":"istio-ingress-controller","knative.dev/controller":"knative.dev.net-istio.pkg.reconciler.ingress.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Ingress","knative.dev/key":"functions/service2"}
{"severity":"DEBUG","timestamp":"2024-08-02T07:33:21.219466948Z","logger":"net-istio-controller.istio-ingress-controller","caller":"controller/controller.go:418","message":"Adding to queue functions/service2 (depth: 1)","commit":"a8bc624-dirty","knative.dev/controller":"istio-ingress-controller","knative.dev/controller":"knative.dev.net-istio.pkg.reconciler.ingress.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Ingress","knative.dev/key":"functions/service2"}
{"severity":"DEBUG","timestamp":"2024-08-02T07:33:21.219508296Z","logger":"net-istio-controller.istio-ingress-controller","caller":"controller/controller.go:418","message":"Adding to queue push-notification-gateway/service3 (depth: 2)","commit":"a8bc624-dirty","knative.dev/controller":"istio-ingress-controller","knative.dev/controller":"knative.dev.net-istio.pkg.reconciler.ingress.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Ingress","knative.dev/key":"push-notification-gateway/service3"}
{"severity":"DEBUG","timestamp":"2024-08-02T07:33:21.219623036Z","logger":"net-istio-controller.istio-ingress-controller","caller":"controller/controller.go:418","message":"Adding to queue push-notification-gateway/service3 (depth: 4)","commit":"a8bc624-dirty","knative.dev/controller":"istio-ingress-controller","knative.dev/controller":"knative.dev.net-istio.pkg.reconciler.ingress.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Ingress","knative.dev/key":"push-notification-gateway/service3"}
{"severity":"DEBUG","timestamp":"2024-08-02T07:33:21.414736275Z","logger":"net-istio-controller.istio-ingress-controller","caller":"controller/controller.go:418","message":"Adding to queue functions/service1 (depth: 4)","commit":"a8bc624-dirty","knative.dev/controller":"istio-ingress-controller","knative.dev/controller":"knative.dev.net-istio.pkg.reconciler.ingress.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Ingress","knative.dev/key":"functions/service1"}
{"severity":"DEBUG","timestamp":"2024-08-02T07:33:21.414854426Z","logger":"net-istio-controller.istio-ingress-controller","caller":"controller/controller.go:418","message":"Adding to queue functions/service2 (depth: 5)","commit":"a8bc624-dirty","knative.dev/controller":"istio-ingress-controller","knative.dev/controller":"knative.dev.net-istio.pkg.reconciler.ingress.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Ingress","knative.dev/key":"functions/service2"}
{"severity":"DEBUG","timestamp":"2024-08-02T07:33:21.41501659Z","logger":"net-istio-controller.istio-ingress-controller","caller":"controller/controller.go:418","message":"Adding to queue push-notification-gateway/service3 (depth: 4)","commit":"a8bc624-dirty","knative.dev/controller":"istio-ingress-controller","knative.dev/controller":"knative.dev.net-istio.pkg.reconciler.ingress.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Ingress","knative.dev/key":"push-notification-gateway/service3"}
{"severity":"DEBUG","timestamp":"2024-08-02T07:33:34.220726032Z","logger":"net-istio-controller.istio-ingress-controller","caller":"controller/controller.go:418","message":"Adding to queue functions/service1 (depth: 4)","commit":"a8bc624-dirty","knative.dev/controller":"istio-ingress-controller","knative.dev/controller":"knative.dev.net-istio.pkg.reconciler.ingress.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Ingress","knative.dev/key":"functions/service1"}
{"severity":"DEBUG","timestamp":"2024-08-02T07:33:34.221087337Z","logger":"net-istio-controller.istio-ingress-controller","caller":"controller/controller.go:418","message":"Adding to queue functions/service2 (depth: 5)","commit":"a8bc624-dirty","knative.dev/controller":"istio-ingress-controller","knative.dev/controller":"knative.dev.net-istio.pkg.reconciler.ingress.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Ingress","knative.dev/key":"functions/service2"}
{"severity":"DEBUG","timestamp":"2024-08-02T07:33:34.221493671Z","logger":"net-istio-controller.istio-ingress-controller","caller":"controller/controller.go:418","message":"Adding to queue push-notification-gateway/service3 (depth: 4)","commit":"a8bc624-dirty","knative.dev/controller":"istio-ingress-controller","knative.dev/controller":"knative.dev.net-istio.pkg.reconciler.ingress.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Ingress","knative.dev/key":"push-notification-gateway/service3"}
{"level":"fatal","ts":1722584029.926126,"logger":"fallback","caller":"injection/injection.go:65","msg":"Failed to start informers","error":"failed to wait for cache at index 8 to sync","stacktrace":"knative.dev/pkg/injection.EnableInjectionOrDie.func1\n\tknative.dev/pkg@v0.0.0-20240716082220-4355f0c73608/injection/injection.go:65\nknative.dev/pkg/injection/sharedmain.MainWithConfig\n\tknative.dev/pkg@v0.0.0-20240716082220-4355f0c73608/injection/sharedmain/main.go:309\nknative.dev/pkg/injection/sharedmain.MainWithContext\n\tknative.dev/pkg@v0.0.0-20240716082220-4355f0c73608/injection/sharedmain/main.go:209\nmain.main\n\tknative.dev/net-istio/cmd/controller/main.go:37\nruntime.main\n\truntime/proc.go:271"}

Steps to Reproduce the Problem

Can't really tell - The same Knative configuration applied with Terraform works on our dev cluster but not in our staging cluster

maylukas commented 3 months ago

Knative Serving Resource - we have tried uninstalling it and re-installing it from scratch and have increased the memory as we saw some OOMKilled errors.

apiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
metadata:
  creationTimestamp: '2024-08-01T15:49:04Z'
  finalizers:
    - knativeservings.operator.knative.dev
  generation: 1
  managedFields:
    - apiVersion: operator.knative.dev/v1beta1
      fieldsType: FieldsV1
      fieldsV1:
        f:spec:
          f:config:
            f:domain:
              .: {}
              f:functions.api.staging.miku-app.com: {}
            f:istio:
              .: {}
              f:local-gateway.knative-serving.knative-local-gateway: {}
          f:deployments: {}
          f:version: {}
      manager: Terraform
      operation: Apply
      time: '2024-08-01T15:49:04Z'
    - apiVersion: operator.knative.dev/v1beta1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:finalizers:
            .: {}
            v:"knativeservings.operator.knative.dev": {}
      manager: operator
      operation: Update
      time: '2024-08-01T15:49:04Z'
    - apiVersion: operator.knative.dev/v1beta1
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          .: {}
          f:conditions: {}
          f:manifests: {}
          f:observedGeneration: {}
          f:version: {}
      manager: operator
      operation: Update
      subresource: status
      time: '2024-08-01T19:07:30Z'
  name: knative-serving
  namespace: knative-serving
  resourceVersion: '2570047568'
  uid: 98928fb8-f58c-4cf5-8912-17e4bbb77419
  selfLink: >-
    /apis/operator.knative.dev/v1beta1/namespaces/knative-serving/knativeservings/knative-serving
status:
  conditions:
    - lastTransitionTime: '2024-08-01T15:49:18Z'
      status: 'True'
      type: DependenciesInstalled
    - lastTransitionTime: '2024-08-01T15:49:34Z'
      message: 'Waiting on deployments: net-istio-controller'
      reason: NotReady
      status: 'False'
      type: DeploymentsAvailable
    - lastTransitionTime: '2024-08-01T19:07:29Z'
      status: 'True'
      type: InstallSucceeded
    - lastTransitionTime: '2024-08-01T19:07:29Z'
      message: 'Waiting on deployments: net-istio-controller'
      reason: NotReady
      status: 'False'
      type: Ready
    - lastTransitionTime: '2024-08-01T15:49:04Z'
      status: 'True'
      type: VersionMigrationEligible
  manifests:
    - /var/run/ko/knative-serving/1.15.0
    - /var/run/ko/ingress/1.15/istio
  observedGeneration: 1
  version: 1.15.0
spec:
  config:
    domain:
      functions.api.staging.miku-app.com: ''
    istio:
      local-gateway.knative-serving.knative-local-gateway: knative-local-gateway.istio-system.svc.cluster.local
  deployments:
    - name: net-istio-controller
      resources:
        - container: controller
          limits:
            memory: 1Gi
          requests:
            memory: 100Mi
  version: 1.15.0
maylukas commented 3 months ago

I now even tried to remote knative entirely and also delete all resources related to it by deleting all resources with knative in the type. The error still persists

kubectl api-resources --verbs=list --namespaced -o name \
| xargs -n 1 kubectl get --show-kind --ignore-not-found --all-namespaces | grep knative

ReToCode commented 2 months ago

How many (and large) secrets do you have in your cluster? Maybe try https://github.com/knative-extensions/net-istio/pull/920 (check the release notes in that PR).

dprotaso commented 2 months ago

How are you installing Istio?

skonto commented 1 day ago

@maylukas gentle ping.