Closed dejongm closed 2 years ago
Can you show a describe of the deployment please?
# kubectl describe deployments -n keda
Name: keda-operator
Namespace: keda
CreationTimestamp: Fri, 02 Sep 2022 16:20:00 -0400
Labels: app=keda-operator
app.kubernetes.io/component=operator
app.kubernetes.io/instance=keda
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=keda-operator
app.kubernetes.io/part-of=keda-operator
app.kubernetes.io/version=2.8.0
helm.sh/chart=keda-2.8.1
name=keda-operator
Annotations: deployment.kubernetes.io/revision: 1
meta.helm.sh/release-name: keda
meta.helm.sh/release-namespace: keda
Selector: app=keda-operator
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=keda-operator
app.kubernetes.io/component=operator
app.kubernetes.io/instance=keda
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=keda-operator
app.kubernetes.io/part-of=keda-operator
app.kubernetes.io/version=2.8.0
helm.sh/chart=keda-2.8.1
name=keda-operator
Service Account: keda-operator
Containers:
keda-operator:
Image: artifactory.bobsburgers.org/docker/vendor/kedacore/keda:2.8.0
Port: 8080/TCP
Host Port: 0/TCP
Command:
/keda
Args:
--leader-elect
--zap-log-level=info
--zap-encoder=console
--zap-time-encoding=rfc3339
Limits:
cpu: 1
memory: 1000Mi
Requests:
cpu: 100m
memory: 100Mi
Liveness: http-get http://:8081/healthz delay=25s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:8081/readyz delay=20s timeout=1s period=10s #success=1 #failure=3
Environment:
WATCH_NAMESPACE:
POD_NAME: (v1:metadata.name)
OPERATOR_NAME: keda-operator
KEDA_HTTP_DEFAULT_TIMEOUT: 3000
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Progressing True NewReplicaSetAvailable
Available True MinimumReplicasAvailable
OldReplicaSets: <none>
NewReplicaSet: keda-operator-5db697b468 (1/1 replicas created)
Events: <none>
Name: keda-operator-metrics-apiserver
Namespace: keda
CreationTimestamp: Fri, 02 Sep 2022 16:20:00 -0400
Labels: app=keda-operator-metrics-apiserver
app.kubernetes.io/component=operator
app.kubernetes.io/instance=keda
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=keda-operator-metrics-apiserver
app.kubernetes.io/part-of=keda-operator
app.kubernetes.io/version=2.8.0
helm.sh/chart=keda-2.8.1
Annotations: deployment.kubernetes.io/revision: 1
meta.helm.sh/release-name: keda
meta.helm.sh/release-namespace: keda
Selector: app=keda-operator-metrics-apiserver
Replicas: 1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=keda-operator-metrics-apiserver
app.kubernetes.io/component=operator
app.kubernetes.io/instance=keda
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=keda-operator-metrics-apiserver
app.kubernetes.io/part-of=keda-operator
app.kubernetes.io/version=2.8.0
helm.sh/chart=keda-2.8.1
Service Account: keda-operator
Containers:
keda-operator-metrics-apiserver:
Image: artifactory.bobsburgers.org/docker/vendor/kedacore/keda-metrics-apiserver:2.8.0
Ports: 6443/TCP, 8080/TCP
Host Ports: 0/TCP, 0/TCP
Args:
/usr/local/bin/keda-adapter
--secure-port=6443
--logtostderr=true
--v=0
Limits:
cpu: 1
memory: 1000Mi
Requests:
cpu: 100m
memory: 100Mi
Liveness: http-get https://:6443/healthz delay=5s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get https://:6443/readyz delay=5s timeout=1s period=10s #success=1 #failure=3
Environment:
WATCH_NAMESPACE:
KEDA_HTTP_DEFAULT_TIMEOUT: 3000
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Available False MinimumReplicasUnavailable
Progressing False ProgressDeadlineExceeded
OldReplicaSets: <none>
NewReplicaSet: keda-operator-metrics-apiserver-6d5c7c69b7 (1/1 replicas created)
Events: <none>
Hello, Just following up to see if you may have any thoughts on this issue. Let me know if I can provide anything else.
Hey,
Sorry for the late response :(
What event are raised by k8s? OOMKill? I can't see anything in the log apart from Shutdown signal received
. From KEDA logs, it seems like an external killing.
# ./kubectl --cluster arn:aws:eks:us-east-1:XXXXXXX:cluster/dev logs keda-operator-metrics-apiserver-6d5c7c69b7-7zj2l -n keda -c keda-operator-metrics-apiserver --previous
I0914 19:09:49.983539 1 request.go:601] Waited for 1.046679236s due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/apis/admissionregistration.k8s.io/v1?timeout=32s
1.6631825910878158e+09 INFO controller-runtime.metrics Metrics server is starting to listen {"addr": ":8080"}
1.6631825910912876e+09 INFO setup Running on Kubernetes 1.22+ {"version": "v1.22.11-eks-18ef993"}
1.6631825910917504e+09 INFO setup Starting manager
1.663182591091778e+09 INFO setup KEDA Version: 2.8.0
1.663182591091788e+09 INFO setup Git Commit: a4a118201214e7abdeebad72cbe337b9856f8191
1.6631825910917976e+09 INFO setup Go Version: go1.17.13
1.663182591091802e+09 INFO setup Go OS/Arch: linux/amd64
1.663182591093803e+09 INFO Starting server {"path": "/metrics", "kind": "metrics", "addr": "[::]:8080"}
1.6631825910938833e+09 INFO Starting server {"kind": "health probe", "addr": "[::]:8081"}
1.6631825910939968e+09 INFO Starting EventSource {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "source": "kind source: *v1alpha1.ScaledObject"}
1.6631825910940678e+09 INFO Starting EventSource {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "source": "kind source: *v2beta2.HorizontalPodAutoscaler"}
1.6631825910940726e+09 INFO Starting Controller {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject"}
1.663182591094488e+09 INFO Starting EventSource {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication", "source": "kind source: *v1alpha1.TriggerAuthentication"}
1.6631825910945182e+09 INFO Starting Controller {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication"}
1.6631825910951145e+09 INFO Starting EventSource {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "source": "kind source: *v1alpha1.ScaledJob"}
1.6631825910951412e+09 INFO Starting Controller {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob"}
1.6631825910962663e+09 INFO Starting EventSource {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication", "source": "kind source: *v1alpha1.ClusterTriggerAuthentication"}
1.6631825910962954e+09 INFO Starting Controller {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication"}
1.6631825911957881e+09 INFO Starting workers {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication", "worker count": 1}
1.6631825911959527e+09 INFO Starting workers {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "worker count": 5}
1.663182591196019e+09 INFO Starting workers {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "worker count": 1}
1.663182591197404e+09 INFO Starting workers {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication", "worker count": 1}
1.6631826215327284e+09 INFO Stopping and waiting for non leader election runnables
1.6631826215328937e+09 INFO Stopping and waiting for leader election runnables
1.6631826215329149e+09 INFO Shutdown signal received, waiting for all workers to finish {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication"}
1.6631826215330055e+09 INFO All workers finished {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication"}
1.6631826215329788e+09 INFO Shutdown signal received, waiting for all workers to finish {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject"}
1.6631826215330217e+09 INFO All workers finished {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject"}
1.6631826215329854e+09 INFO Shutdown signal received, waiting for all workers to finish {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication"}
1.6631826215330315e+09 INFO All workers finished {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication"}
1.6631826215329683e+09 INFO Shutdown signal received, waiting for all workers to finish {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob"}
1.6631826215330403e+09 INFO All workers finished {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob"}
1.6631826215330493e+09 INFO Stopping and waiting for caches
1.6631826215331967e+09 INFO Stopping and waiting for webhooks
1.6631826215332248e+09 INFO Wait completed, proceeding to shutdown the manager
# journalctl -xeu kubelet
Sep 14 15:14:35 ip-10-168-0-30.tst-us-east-1.bobsburgers.aws kubelet[4554]: E0914 15:14:35.311491 4554 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"keda-operator-metrics-apiserver\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=keda-operator-metrics-apiserver pod=keda-operator-metrics-apiserver-6d5c7c69b7-7zj2l_keda(f7d4eb85-34fd-43f9-92d0-4b4ef2a6e5cd)\"" pod="keda/keda-operator-metrics-apiserver-6d5c7c69b7-7zj2l" podUID=f7d4eb85-34fd-43f9-92d0-4b4ef2a6e5cd
Sep 14 15:14:47 ip-10-168-0-30.tst-us-east-1.bobsburgers.aws kubelet[4554]: I0914 15:14:47.310497 4554 scope.go:110] "RemoveContainer" containerID="efefac1b2be9a02e3b7049d92e80f0bee6420d931666e6561ccebd782ec1b6b4"
Sep 14 15:14:47 ip-10-168-0-30.tst-us-east-1.bobsburgers.aws kubelet[4554]: E0914 15:14:47.311217 4554 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"keda-operator-metrics-apiserver\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=keda-operator-metrics-apiserver pod=keda-operator-metrics-apiserver-6d5c7c69b7-7zj2l_keda(f7d4eb85-34fd-43f9-92d0-4b4ef2a6e5cd)\"" pod="keda/keda-operator-metrics-apiserver-6d5c7c69b7-7zj2l" podUID=f7d4eb85-34fd-43f9-92d0-4b4ef2a6e5cd
Sep 14 15:15:00 ip-10-168-0-30.tst-us-east-1.bobsburgers.aws kubelet[4554]: I0914 15:15:00.310961 4554 scope.go:110] "RemoveContainer" containerID="efefac1b2be9a02e3b7049d92e80f0bee6420d931666e6561ccebd782ec1b6b4"
Sep 14 15:15:00 ip-10-168-0-30.tst-us-east-1.bobsburgers.aws kubelet[4554]: E0914 15:15:00.311915 4554 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"keda-operator-metrics-apiserver\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=keda-operator-metrics-apiserver pod=keda-operator-metrics-apiserver-6d5c7c69b7-7zj2l_keda(f7d4eb85-34fd-43f9-92d0-4b4ef2a6e5cd)\"" pod="keda/keda-operator-metrics-apiserver-6d5c7c69b7-7zj2l" podUID=f7d4eb85-34fd-43f9-92d0-4b4ef2a6e5cd
Sep 14 15:15:15 ip-10-168-0-30.tst-us-east-1.bobsburgers.aws kubelet[4554]: I0914 15:15:15.311240 4554 scope.go:110] "RemoveContainer" containerID="efefac1b2be9a02e3b7049d92e80f0bee6420d931666e6561ccebd782ec1b6b4"
Sep 14 15:15:15 ip-10-168-0-30.tst-us-east-1.bobsburgers.aws kubelet[4554]: E0914 15:15:15.312067 4554 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"keda-operator-metrics-apiserver\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=keda-operator-metrics-apiserver pod=keda-operator-metrics-apiserver-6d5c7c69b7-7zj2l_keda(f7d4eb85-34fd-43f9-92d0-4b4ef2a6e5cd)\"" pod="keda/keda-operator-metrics-apiserver-6d5c7c69b7-7zj2l" podUID=f7d4eb85-34fd-43f9-92d0-4b4ef2a6e5cd
Sep 14 15:15:28 ip-10-168-0-30.tst-us-east-1.bobsburgers.aws kubelet[4554]: I0914 15:15:28.311324 4554 scope.go:110] "RemoveContainer" containerID="efefac1b2be9a02e3b7049d92e80f0bee6420d931666e6561ccebd782ec1b6b4"
Sep 14 15:15:28 ip-10-168-0-30.tst-us-east-1.bobsburgers.aws kubelet[4554]: E0914 15:15:28.312518 4554 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"keda-operator-metrics-apiserver\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=keda-operator-metrics-apiserver pod=keda-operator-metrics-apiserver-6d5c7c69b7-7zj2l_keda(f7d4eb85-34fd-43f9-92d0-4b4ef2a6e5cd)\"" pod="keda/keda-operator-metrics-apiserver-6d5c7c69b7-7zj2l" podUID=f7d4eb85-34fd-43f9-92d0-4b4ef2a6e5cd
Sep 14 15:15:41 ip-10-168-0-30.tst-us-east-1.bobsburgers.aws kubelet[4554]: I0914 15:15:41.311338 4554 scope.go:110] "RemoveContainer" containerID="efefac1b2be9a02e3b7049d92e80f0bee6420d931666e6561ccebd782ec1b6b4"
Sep 14 15:15:41 ip-10-168-0-30.tst-us-east-1.bobsburgers.aws kubelet[4554]: E0914 15:15:41.311929 4554 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"keda-operator-metrics-apiserver\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=keda-operator-metrics-apiserver pod=keda-operator-metrics-apiserver-6d5c7c69b7-7zj2l_keda(f7d4eb85-34fd-43f9-92d0-4b4ef2a6e5cd)\"" pod="keda/keda-operator-metrics-apiserver-6d5c7c69b7-7zj2l" podUID=f7d4eb85-34fd-43f9-92d0-4b4ef2a6e5cd
Sep 14 15:15:54 ip-10-168-0-30.tst-us-east-1.bobsburgers.aws kubelet[4554]: I0914 15:15:54.313054 4554 scope.go:110] "RemoveContainer" containerID="efefac1b2be9a02e3b7049d92e80f0bee6420d931666e6561ccebd782ec1b6b4"
This is weird because from KEDA logs I think the error it's outside, but from outside logs I think is in KEDA. Could you enable debug logs and share them please? You can do it modifying the argument - '--v=0'
to - '--v=3'
BTW, there is a bug with Prometheus metrics generated by KEDA in v2.8.0 (Helm chart v2.8.1), I'd recommend upgrading to KEDA v2.8.1 (Helm chart v2.8.2)
Thanks for the feedback! It seems upgrading to Helm chart v.2.8.2 resolved the issue.
Report
Launching a vanilla install of Keda version 2.8.1 Helm Chart on EKS version 1.22 results in a crashloop of the Keda api-server. Launching version 2.7.2 of the Keda Helm Chart on the same cluster results in a successful deployment.
Output from : kubectl get apiservice
The metrics-apiserver readiness probe fails to connect.
Readiness probe failed: Get "https://10.168.1.93:6443/readyz": dial tcp 10.168.1.93:6443: connect: connection refused
Cloudwatch reports 503 error on /apis/external.metrics.k8s.io/v1beta1:
metrics-apiserver logs:
Expected Behavior
A successful deployment of Keda version 2.8.1
Actual Behavior
api-server results in crashloop state
Steps to Reproduce the Problem
Logs from KEDA operator
KEDA Version
2.8.1
Kubernetes Version
1.22
Platform
Amazon Web Services
Scaler Details
No response
Anything else?
No response