kedacore / keda

KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes
https://keda.sh
Apache License 2.0
8.51k stars 1.08k forks source link

EKS vanilla install of 2.8.1 results in api-server crashloop #3643

Closed dejongm closed 2 years ago

dejongm commented 2 years ago

Report

Launching a vanilla install of Keda version 2.8.1 Helm Chart on EKS version 1.22 results in a crashloop of the Keda api-server. Launching version 2.7.2 of the Keda Helm Chart on the same cluster results in a successful deployment.

Output from : kubectl get apiservice

...
v1beta1.external.metrics.k8s.io        keda/keda-operator-metrics-apiserver   False (MissingEndpoints)   9m
...

The metrics-apiserver readiness probe fails to connect. Readiness probe failed: Get "https://10.168.1.93:6443/readyz": dial tcp 10.168.1.93:6443: connect: connection refused

Cloudwatch reports 503 error on /apis/external.metrics.k8s.io/v1beta1:

requestReceivedTimestamp | 2022-09-02T20:33:24.515050Z
requestURI               | /apis/external.metrics.k8s.io/v1beta1?timeout=32s
responseStatus.code      | 503

metrics-apiserver logs:

I0902 18:13:23.852960       1 request.go:601] Waited for 1.047623516s due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/apis/dynatrace.com/v1beta1?timeout=32s
1.6621424049579277e+09  INFO    controller-runtime.metrics  Metrics server is starting to listen    {"addr": ":8080"}
1.662142404959977e+09   INFO    setup   Running on Kubernetes 1.22+ {"version": "v1.22.10-eks-84b4fe6"}
1.662142404960175e+09   INFO    setup   Starting manager
1.6621424049601984e+09  INFO    setup   KEDA Version: 2.8.0
1.6621424049602091e+09  INFO    setup   Git Commit: a4a118201214e7abdeebad72cbe337b9856f8191
1.6621424049602182e+09  INFO    setup   Go Version: go1.17.13
1.6621424049602218e+09  INFO    setup   Go OS/Arch: linux/amd64
1.6621424049605281e+09  INFO    Starting server {"path": "/metrics", "kind": "metrics", "addr": "[::]:8080"}
1.662142404960624e+09   INFO    Starting server {"kind": "health probe", "addr": "[::]:8081"}
1.6621424049608536e+09  INFO    Starting EventSource    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "source": "kind source: *v1alpha1.ScaledObject"}
1.6621424049608936e+09  INFO    Starting EventSource    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "source": "kind source: *v2beta2.HorizontalPodAutoscaler"}
1.6621424049609005e+09  INFO    Starting Controller {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject"}
1.662142404961154e+09   INFO    Starting EventSource    {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication", "source": "kind source: *v1alpha1.TriggerAuthentication"}
1.6621424049611874e+09  INFO    Starting Controller {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication"}
1.6621424049616587e+09  INFO    Starting EventSource    {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "source": "kind source: *v1alpha1.ScaledJob"}
1.6621424049616888e+09  INFO    Starting Controller {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob"}
1.6621424049621346e+09  INFO    Starting EventSource    {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication", "source": "kind source: *v1alpha1.ClusterTriggerAuthentication"}
1.6621424049621968e+09  INFO    Starting Controller {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication"}
1.662142405062094e+09   INFO    Starting workers    {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "worker count": 1}
1.6621424050620935e+09  INFO    Starting workers    {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication", "worker count": 1}
1.6621424050622425e+09  INFO    Starting workers    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "worker count": 5}
1.6621424050624409e+09  INFO    Starting workers    {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication", "worker count": 1}
1.6621424313220365e+09  INFO    Stopping and waiting for non leader election runnables
1.662142431322082e+09   INFO    Stopping and waiting for leader election runnables
1.6621424313220987e+09  INFO    Shutdown signal received, waiting for all workers to finish {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob"}
1.6621424313221316e+09  INFO    All workers finished    {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob"}
1.6621424313221533e+09  INFO    Shutdown signal received, waiting for all workers to finish {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject"}
1.6621424313221583e+09  INFO    All workers finished    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject"}
1.6621424313221366e+09  INFO    Shutdown signal received, waiting for all workers to finish {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication"}
1.6621424313221812e+09  INFO    All workers finished    {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication"}
1.662142431322168e+09   INFO    Shutdown signal received, waiting for all workers to finish {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication"}
1.6621424313221943e+09  INFO    All workers finished    {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication"}
1.662142431322211e+09   INFO    Stopping and waiting for caches
1.662142431322496e+09   INFO    Stopping and waiting for webhooks
1.6621424313225217e+09  INFO    Wait completed, proceeding to shutdown the manager

Expected Behavior

A successful deployment of Keda version 2.8.1

Actual Behavior

api-server results in crashloop state

Steps to Reproduce the Problem

  1. Deploy Keda 2.8.1 Helm Chart to AWS EKS

Logs from KEDA operator

I0902 18:11:21.439210       1 request.go:601] Waited for 1.047540005s due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/apis/secrets-store.csi.x-k8s.io/v1alpha1?timeout=32s
2022-09-02T18:11:22Z    INFO    controller-runtime.metrics  Metrics server is starting to listen    {"addr": ":8080"}
2022-09-02T18:11:22Z    INFO    setup   Running on Kubernetes 1.22+ {"version": "v1.22.10-eks-84b4fe6"}
2022-09-02T18:11:22Z    INFO    setup   Starting manager
2022-09-02T18:11:22Z    INFO    setup   KEDA Version: 2.8.0
2022-09-02T18:11:22Z    INFO    setup   Git Commit: a4a118201214e7abdeebad72cbe337b9856f8191
2022-09-02T18:11:22Z    INFO    setup   Go Version: go1.17.13
2022-09-02T18:11:22Z    INFO    setup   Go OS/Arch: linux/amd64
2022-09-02T18:11:22Z    INFO    Starting server {"path": "/metrics", "kind": "metrics", "addr": "[::]:8080"}
2022-09-02T18:11:22Z    INFO    Starting server {"kind": "health probe", "addr": "[::]:8081"}
I0902 18:11:22.546375       1 leaderelection.go:248] attempting to acquire leader lease keda/operator.keda.sh...
I0902 18:11:22.556988       1 leaderelection.go:258] successfully acquired lease keda/operator.keda.sh
2022-09-02T18:11:22Z    INFO    Starting EventSource    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "source": "kind source: *v1alpha1.ScaledObject"}
2022-09-02T18:11:22Z    INFO    Starting EventSource    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "source": "kind source: *v2beta2.HorizontalPodAutoscaler"}
2022-09-02T18:11:22Z    INFO    Starting Controller {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject"}
2022-09-02T18:11:22Z    INFO    Starting EventSource    {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication", "source": "kind source: *v1alpha1.TriggerAuthentication"}
2022-09-02T18:11:22Z    INFO    Starting Controller {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication"}
2022-09-02T18:11:22Z    INFO    Starting EventSource    {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "source": "kind source: *v1alpha1.ScaledJob"}
2022-09-02T18:11:22Z    INFO    Starting Controller {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob"}
2022-09-02T18:11:22Z    INFO    Starting EventSource    {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication", "source": "kind source: *v1alpha1.ClusterTriggerAuthentication"}
2022-09-02T18:11:22Z    INFO    Starting Controller {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication"}
2022-09-02T18:11:22Z    INFO    Starting workers    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "worker count": 5}
2022-09-02T18:11:22Z    INFO    Starting workers    {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication", "worker count": 1}
2022-09-02T18:11:22Z    INFO    Starting workers    {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "worker count": 1}
2022-09-02T18:11:22Z    INFO    Starting workers    {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication", "worker count": 1}

KEDA Version

2.8.1

Kubernetes Version

1.22

Platform

Amazon Web Services

Scaler Details

No response

Anything else?

No response

tomkerkhove commented 2 years ago

Can you show a describe of the deployment please?

dejongm commented 2 years ago
# kubectl describe deployments -n keda

Name:                   keda-operator
Namespace:              keda
CreationTimestamp:      Fri, 02 Sep 2022 16:20:00 -0400
Labels:                 app=keda-operator
                        app.kubernetes.io/component=operator
                        app.kubernetes.io/instance=keda
                        app.kubernetes.io/managed-by=Helm
                        app.kubernetes.io/name=keda-operator
                        app.kubernetes.io/part-of=keda-operator
                        app.kubernetes.io/version=2.8.0
                        helm.sh/chart=keda-2.8.1
                        name=keda-operator
Annotations:            deployment.kubernetes.io/revision: 1
                        meta.helm.sh/release-name: keda
                        meta.helm.sh/release-namespace: keda
Selector:               app=keda-operator
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app=keda-operator
                    app.kubernetes.io/component=operator
                    app.kubernetes.io/instance=keda
                    app.kubernetes.io/managed-by=Helm
                    app.kubernetes.io/name=keda-operator
                    app.kubernetes.io/part-of=keda-operator
                    app.kubernetes.io/version=2.8.0
                    helm.sh/chart=keda-2.8.1
                    name=keda-operator
  Service Account:  keda-operator
  Containers:
   keda-operator:
    Image:      artifactory.bobsburgers.org/docker/vendor/kedacore/keda:2.8.0
    Port:       8080/TCP
    Host Port:  0/TCP
    Command:
      /keda
    Args:
      --leader-elect
      --zap-log-level=info
      --zap-encoder=console
      --zap-time-encoding=rfc3339
    Limits:
      cpu:     1
      memory:  1000Mi
    Requests:
      cpu:      100m
      memory:   100Mi
    Liveness:   http-get http://:8081/healthz delay=25s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:8081/readyz delay=20s timeout=1s period=10s #success=1 #failure=3
    Environment:
      WATCH_NAMESPACE:            
      POD_NAME:                    (v1:metadata.name)
      OPERATOR_NAME:              keda-operator
      KEDA_HTTP_DEFAULT_TIMEOUT:  3000
    Mounts:                       <none>
  Volumes:                        <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Progressing    True    NewReplicaSetAvailable
  Available      True    MinimumReplicasAvailable
OldReplicaSets:  <none>
NewReplicaSet:   keda-operator-5db697b468 (1/1 replicas created)
Events:          <none>

Name:                   keda-operator-metrics-apiserver
Namespace:              keda
CreationTimestamp:      Fri, 02 Sep 2022 16:20:00 -0400
Labels:                 app=keda-operator-metrics-apiserver
                        app.kubernetes.io/component=operator
                        app.kubernetes.io/instance=keda
                        app.kubernetes.io/managed-by=Helm
                        app.kubernetes.io/name=keda-operator-metrics-apiserver
                        app.kubernetes.io/part-of=keda-operator
                        app.kubernetes.io/version=2.8.0
                        helm.sh/chart=keda-2.8.1
Annotations:            deployment.kubernetes.io/revision: 1
                        meta.helm.sh/release-name: keda
                        meta.helm.sh/release-namespace: keda
Selector:               app=keda-operator-metrics-apiserver
Replicas:               1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app=keda-operator-metrics-apiserver
                    app.kubernetes.io/component=operator
                    app.kubernetes.io/instance=keda
                    app.kubernetes.io/managed-by=Helm
                    app.kubernetes.io/name=keda-operator-metrics-apiserver
                    app.kubernetes.io/part-of=keda-operator
                    app.kubernetes.io/version=2.8.0
                    helm.sh/chart=keda-2.8.1
  Service Account:  keda-operator
  Containers:
   keda-operator-metrics-apiserver:
    Image:       artifactory.bobsburgers.org/docker/vendor/kedacore/keda-metrics-apiserver:2.8.0
    Ports:       6443/TCP, 8080/TCP
    Host Ports:  0/TCP, 0/TCP
    Args:
      /usr/local/bin/keda-adapter
      --secure-port=6443
      --logtostderr=true
      --v=0
    Limits:
      cpu:     1
      memory:  1000Mi
    Requests:
      cpu:      100m
      memory:   100Mi
    Liveness:   http-get https://:6443/healthz delay=5s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get https://:6443/readyz delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment:
      WATCH_NAMESPACE:            
      KEDA_HTTP_DEFAULT_TIMEOUT:  3000
    Mounts:                       <none>
  Volumes:                        <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      False   MinimumReplicasUnavailable
  Progressing    False   ProgressDeadlineExceeded
OldReplicaSets:  <none>
NewReplicaSet:   keda-operator-metrics-apiserver-6d5c7c69b7 (1/1 replicas created)
Events:          <none>
dejongm commented 2 years ago

Hello, Just following up to see if you may have any thoughts on this issue. Let me know if I can provide anything else.

JorTurFer commented 2 years ago

Hey, Sorry for the late response :( What event are raised by k8s? OOMKill? I can't see anything in the log apart from Shutdown signal received. From KEDA logs, it seems like an external killing.

dejongm commented 2 years ago
# ./kubectl --cluster arn:aws:eks:us-east-1:XXXXXXX:cluster/dev logs keda-operator-metrics-apiserver-6d5c7c69b7-7zj2l -n keda -c keda-operator-metrics-apiserver --previous

I0914 19:09:49.983539       1 request.go:601] Waited for 1.046679236s due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/apis/admissionregistration.k8s.io/v1?timeout=32s
1.6631825910878158e+09  INFO    controller-runtime.metrics  Metrics server is starting to listen    {"addr": ":8080"}
1.6631825910912876e+09  INFO    setup   Running on Kubernetes 1.22+ {"version": "v1.22.11-eks-18ef993"}
1.6631825910917504e+09  INFO    setup   Starting manager
1.663182591091778e+09   INFO    setup   KEDA Version: 2.8.0
1.663182591091788e+09   INFO    setup   Git Commit: a4a118201214e7abdeebad72cbe337b9856f8191
1.6631825910917976e+09  INFO    setup   Go Version: go1.17.13
1.663182591091802e+09   INFO    setup   Go OS/Arch: linux/amd64
1.663182591093803e+09   INFO    Starting server {"path": "/metrics", "kind": "metrics", "addr": "[::]:8080"}
1.6631825910938833e+09  INFO    Starting server {"kind": "health probe", "addr": "[::]:8081"}
1.6631825910939968e+09  INFO    Starting EventSource    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "source": "kind source: *v1alpha1.ScaledObject"}
1.6631825910940678e+09  INFO    Starting EventSource    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "source": "kind source: *v2beta2.HorizontalPodAutoscaler"}
1.6631825910940726e+09  INFO    Starting Controller {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject"}
1.663182591094488e+09   INFO    Starting EventSource    {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication", "source": "kind source: *v1alpha1.TriggerAuthentication"}
1.6631825910945182e+09  INFO    Starting Controller {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication"}
1.6631825910951145e+09  INFO    Starting EventSource    {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "source": "kind source: *v1alpha1.ScaledJob"}
1.6631825910951412e+09  INFO    Starting Controller {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob"}
1.6631825910962663e+09  INFO    Starting EventSource    {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication", "source": "kind source: *v1alpha1.ClusterTriggerAuthentication"}
1.6631825910962954e+09  INFO    Starting Controller {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication"}
1.6631825911957881e+09  INFO    Starting workers    {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication", "worker count": 1}
1.6631825911959527e+09  INFO    Starting workers    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "worker count": 5}
1.663182591196019e+09   INFO    Starting workers    {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "worker count": 1}
1.663182591197404e+09   INFO    Starting workers    {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication", "worker count": 1}
1.6631826215327284e+09  INFO    Stopping and waiting for non leader election runnables
1.6631826215328937e+09  INFO    Stopping and waiting for leader election runnables
1.6631826215329149e+09  INFO    Shutdown signal received, waiting for all workers to finish {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication"}
1.6631826215330055e+09  INFO    All workers finished    {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication"}
1.6631826215329788e+09  INFO    Shutdown signal received, waiting for all workers to finish {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject"}
1.6631826215330217e+09  INFO    All workers finished    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject"}
1.6631826215329854e+09  INFO    Shutdown signal received, waiting for all workers to finish {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication"}
1.6631826215330315e+09  INFO    All workers finished    {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication"}
1.6631826215329683e+09  INFO    Shutdown signal received, waiting for all workers to finish {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob"}
1.6631826215330403e+09  INFO    All workers finished    {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob"}
1.6631826215330493e+09  INFO    Stopping and waiting for caches
1.6631826215331967e+09  INFO    Stopping and waiting for webhooks
1.6631826215332248e+09  INFO    Wait completed, proceeding to shutdown the manager
# journalctl -xeu kubelet
Sep 14 15:14:35 ip-10-168-0-30.tst-us-east-1.bobsburgers.aws kubelet[4554]: E0914 15:14:35.311491    4554 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"keda-operator-metrics-apiserver\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=keda-operator-metrics-apiserver pod=keda-operator-metrics-apiserver-6d5c7c69b7-7zj2l_keda(f7d4eb85-34fd-43f9-92d0-4b4ef2a6e5cd)\"" pod="keda/keda-operator-metrics-apiserver-6d5c7c69b7-7zj2l" podUID=f7d4eb85-34fd-43f9-92d0-4b4ef2a6e5cd
Sep 14 15:14:47 ip-10-168-0-30.tst-us-east-1.bobsburgers.aws kubelet[4554]: I0914 15:14:47.310497    4554 scope.go:110] "RemoveContainer" containerID="efefac1b2be9a02e3b7049d92e80f0bee6420d931666e6561ccebd782ec1b6b4"
Sep 14 15:14:47 ip-10-168-0-30.tst-us-east-1.bobsburgers.aws kubelet[4554]: E0914 15:14:47.311217    4554 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"keda-operator-metrics-apiserver\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=keda-operator-metrics-apiserver pod=keda-operator-metrics-apiserver-6d5c7c69b7-7zj2l_keda(f7d4eb85-34fd-43f9-92d0-4b4ef2a6e5cd)\"" pod="keda/keda-operator-metrics-apiserver-6d5c7c69b7-7zj2l" podUID=f7d4eb85-34fd-43f9-92d0-4b4ef2a6e5cd
Sep 14 15:15:00 ip-10-168-0-30.tst-us-east-1.bobsburgers.aws kubelet[4554]: I0914 15:15:00.310961    4554 scope.go:110] "RemoveContainer" containerID="efefac1b2be9a02e3b7049d92e80f0bee6420d931666e6561ccebd782ec1b6b4"
Sep 14 15:15:00 ip-10-168-0-30.tst-us-east-1.bobsburgers.aws kubelet[4554]: E0914 15:15:00.311915    4554 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"keda-operator-metrics-apiserver\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=keda-operator-metrics-apiserver pod=keda-operator-metrics-apiserver-6d5c7c69b7-7zj2l_keda(f7d4eb85-34fd-43f9-92d0-4b4ef2a6e5cd)\"" pod="keda/keda-operator-metrics-apiserver-6d5c7c69b7-7zj2l" podUID=f7d4eb85-34fd-43f9-92d0-4b4ef2a6e5cd
Sep 14 15:15:15 ip-10-168-0-30.tst-us-east-1.bobsburgers.aws kubelet[4554]: I0914 15:15:15.311240    4554 scope.go:110] "RemoveContainer" containerID="efefac1b2be9a02e3b7049d92e80f0bee6420d931666e6561ccebd782ec1b6b4"
Sep 14 15:15:15 ip-10-168-0-30.tst-us-east-1.bobsburgers.aws kubelet[4554]: E0914 15:15:15.312067    4554 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"keda-operator-metrics-apiserver\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=keda-operator-metrics-apiserver pod=keda-operator-metrics-apiserver-6d5c7c69b7-7zj2l_keda(f7d4eb85-34fd-43f9-92d0-4b4ef2a6e5cd)\"" pod="keda/keda-operator-metrics-apiserver-6d5c7c69b7-7zj2l" podUID=f7d4eb85-34fd-43f9-92d0-4b4ef2a6e5cd
Sep 14 15:15:28 ip-10-168-0-30.tst-us-east-1.bobsburgers.aws kubelet[4554]: I0914 15:15:28.311324    4554 scope.go:110] "RemoveContainer" containerID="efefac1b2be9a02e3b7049d92e80f0bee6420d931666e6561ccebd782ec1b6b4"
Sep 14 15:15:28 ip-10-168-0-30.tst-us-east-1.bobsburgers.aws kubelet[4554]: E0914 15:15:28.312518    4554 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"keda-operator-metrics-apiserver\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=keda-operator-metrics-apiserver pod=keda-operator-metrics-apiserver-6d5c7c69b7-7zj2l_keda(f7d4eb85-34fd-43f9-92d0-4b4ef2a6e5cd)\"" pod="keda/keda-operator-metrics-apiserver-6d5c7c69b7-7zj2l" podUID=f7d4eb85-34fd-43f9-92d0-4b4ef2a6e5cd
Sep 14 15:15:41 ip-10-168-0-30.tst-us-east-1.bobsburgers.aws kubelet[4554]: I0914 15:15:41.311338    4554 scope.go:110] "RemoveContainer" containerID="efefac1b2be9a02e3b7049d92e80f0bee6420d931666e6561ccebd782ec1b6b4"
Sep 14 15:15:41 ip-10-168-0-30.tst-us-east-1.bobsburgers.aws kubelet[4554]: E0914 15:15:41.311929    4554 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"keda-operator-metrics-apiserver\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=keda-operator-metrics-apiserver pod=keda-operator-metrics-apiserver-6d5c7c69b7-7zj2l_keda(f7d4eb85-34fd-43f9-92d0-4b4ef2a6e5cd)\"" pod="keda/keda-operator-metrics-apiserver-6d5c7c69b7-7zj2l" podUID=f7d4eb85-34fd-43f9-92d0-4b4ef2a6e5cd
Sep 14 15:15:54 ip-10-168-0-30.tst-us-east-1.bobsburgers.aws kubelet[4554]: I0914 15:15:54.313054    4554 scope.go:110] "RemoveContainer" containerID="efefac1b2be9a02e3b7049d92e80f0bee6420d931666e6561ccebd782ec1b6b4"
JorTurFer commented 2 years ago

This is weird because from KEDA logs I think the error it's outside, but from outside logs I think is in KEDA. Could you enable debug logs and share them please? You can do it modifying the argument - '--v=0' to - '--v=3' image

BTW, there is a bug with Prometheus metrics generated by KEDA in v2.8.0 (Helm chart v2.8.1), I'd recommend upgrading to KEDA v2.8.1 (Helm chart v2.8.2)

dejongm commented 2 years ago

Thanks for the feedback! It seems upgrading to Helm chart v.2.8.2 resolved the issue.