kedacore / keda

KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes
https://keda.sh
Apache License 2.0
8.5k stars 1.07k forks source link

KEDA Operator Crashing #2052

Closed bpotaczek closed 3 years ago

bpotaczek commented 3 years ago

Report

The keda operator crashes once I apply the file to create my ScaledObject and doesn't create the HPA.

Expected Behavior

I expected the HPA to be created.

Actual Behavior

The operator crashes before the HPA is created.

Steps to Reproduce the Problem

  1. Install keda v2.4.0 with helm
  2. Apply yaml file with secret, triggerauthentication and scaledobject
  3. Watch logs
  4. Once it logs Creating a new HPA then it crashes and restarts

Logs from KEDA operator

I0823 19:01:57.196307       1 request.go:655] Throttling request took 1.047261865s, request: GET:https://10.100.0.1:443/apis/scheduling.k8s.io/v1beta1?timeout=32s
2021-08-23T19:02:00.301Z        INFO    controller-runtime.metrics      metrics server is starting to listen    {"addr": ":8080"}
2021-08-23T19:02:00.303Z        INFO    controllers.ScaledObject        Running on Kubernetes 1.20+     {"version": "v1.20.7-eks-d88609"}
2021-08-23T19:02:00.303Z        INFO    setup   Starting manager
2021-08-23T19:02:00.303Z        INFO    setup   KEDA Version: 2.4.0
2021-08-23T19:02:00.303Z        INFO    setup   Git Commit:
2021-08-23T19:02:00.303Z        INFO    setup   Go Version: go1.15.13
2021-08-23T19:02:00.303Z        INFO    setup   Go OS/Arch: linux/amd64
I0823 19:02:00.303753       1 leaderelection.go:243] attempting to acquire leader lease keda/operator.keda.sh...
2021-08-23T19:02:00.303Z        INFO    controller-runtime.manager      starting metrics server {"path": "/metrics"}
I0823 19:02:17.717791       1 leaderelection.go:253] successfully acquired lease keda/operator.keda.sh
2021-08-23T19:02:17.717Z        DEBUG   controller-runtime.manager.events       Normal  {"object": {"kind":"ConfigMap","namespace":"keda","name":"operator.keda.sh","uid":"1798d9da-8a44-4108-9233-02ad5ed62121","apiVersion":"v1","resourceVersion":"120415142"}, "reason": "LeaderElection", "message": "keda-operator-846b56df59-s257h_5f2d7eb6-f0a3-4367-81a8-fd6567c3a162 became leader"}
2021-08-23T19:02:17.718Z        INFO    controller      Starting EventSource    {"reconcilerGroup": "keda.sh", "reconcilerKind": "ClusterTriggerAuthentication", "controller": "clustertriggerauthentication", "source": "kind source: /, Kind="}
2021-08-23T19:02:17.718Z        INFO    controller      Starting EventSource    {"reconcilerGroup": "keda.sh", "reconcilerKind": "ScaledJob", "controller": "scaledjob", "source": "kind source: /, Kind="}
2021-08-23T19:02:17.718Z        INFO    controller      Starting EventSource    {"reconcilerGroup": "keda.sh", "reconcilerKind": "TriggerAuthentication", "controller": "triggerauthentication", "source": "kind source: /, Kind="}
2021-08-23T19:02:17.718Z        INFO    controller      Starting EventSource    {"reconcilerGroup": "keda.sh", "reconcilerKind": "ScaledObject", "controller": "scaledobject", "source": "kind source: /, Kind="}
2021-08-23T19:02:17.818Z        INFO    controller      Starting Controller     {"reconcilerGroup": "keda.sh", "reconcilerKind": "ClusterTriggerAuthentication", "controller": "clustertriggerauthentication"}
2021-08-23T19:02:17.818Z        INFO    controller      Starting EventSource    {"reconcilerGroup": "keda.sh", "reconcilerKind": "ScaledObject", "controller": "scaledobject", "source": "kind source: /, Kind="}
2021-08-23T19:02:17.818Z        INFO    controller      Starting Controller     {"reconcilerGroup": "keda.sh", "reconcilerKind": "TriggerAuthentication", "controller": "triggerauthentication"}
2021-08-23T19:02:17.938Z        INFO    controller      Starting Controller     {"reconcilerGroup": "keda.sh", "reconcilerKind": "ScaledObject", "controller": "scaledobject"}
2021-08-23T19:02:18.038Z        INFO    controller      Starting workers        {"reconcilerGroup": "keda.sh", "reconcilerKind": "ScaledObject", "controller": "scaledobject", "worker count": 1}
2021-08-23T19:02:18.118Z        INFO    controller      Starting workers        {"reconcilerGroup": "keda.sh", "reconcilerKind": "ClusterTriggerAuthentication", "controller": "clustertriggerauthentication", "worker count": 1}
2021-08-23T19:02:18.118Z        INFO    controller      Starting Controller     {"reconcilerGroup": "keda.sh", "reconcilerKind": "ScaledJob", "controller": "scaledjob"}
2021-08-23T19:02:18.118Z        INFO    controller      Starting workers        {"reconcilerGroup": "keda.sh", "reconcilerKind": "ScaledJob", "controller": "scaledjob", "worker count": 1}
2021-08-23T19:02:18.118Z        INFO    controller      Starting workers        {"reconcilerGroup": "keda.sh", "reconcilerKind": "TriggerAuthentication", "controller": "triggerauthentication", "worker count": 1}
2021-08-23T19:02:56.962Z        DEBUG   controller      Successfully Reconciled {"reconcilerGroup": "keda.sh", "reconcilerKind": "TriggerAuthentication", "controller": "triggerauthentication", "name": "keda-trigger-auth-kafka-credential", "namespace": "echo"}
2021-08-23T19:02:56.962Z        DEBUG   controller-runtime.manager.events       Normal  {"object": {"kind":"TriggerAuthentication","namespace":"echo","name":"keda-trigger-auth-kafka-credential","uid":"32581dc7-4757-4aa5-b752-52da6ebfb170","apiVersion":"keda.sh/v1alpha1","resourceVersion":"120415389"}, "reason": "TriggerAuthenticationAdded", "message": "New TriggerAuthentication configured"}
2021-08-23T19:02:57.174Z        INFO    controllers.ScaledObject        Reconciling ScaledObject        {"ScaledObject.Namespace": "echo", "ScaledObject.Name": "kafka-kbxtechogrpc-sample"}
2021-08-23T19:02:57.174Z        INFO    controllers.ScaledObject        Adding Finalizer for the ScaledObject   {"ScaledObject.Namespace": "echo", "ScaledObject.Name": "kafka-kbxtechogrpc-sample"}
2021-08-23T19:02:57.194Z        DEBUG   controllers.ScaledObject        Adding "scaledobject.keda.sh/name" label on ScaledObject        {"ScaledObject.Namespace": "echo", "ScaledObject.Name": "kafka-kbxtechogrpc-sample", "value": "kafka-kbxtechogrpc-sample"}
2021-08-23T19:02:57.204Z        DEBUG   controllers.ScaledObject        Parsed Group, Version, Kind, Resource   {"ScaledObject.Namespace": "echo", "ScaledObject.Name": "kafka-kbxtechogrpc-sample", "GVK": "apps/v1.Deployment", "Resource": "deployments"}
2021-08-23T19:02:57.223Z        INFO    controllers.ScaledObject        Detected resource targeted for scaling  {"ScaledObject.Namespace": "echo", "ScaledObject.Name": "kafka-kbxtechogrpc-sample", "resource": "apps/v1.Deployment", "name": "kbxtechogrpc"}
2021-08-23T19:02:57.223Z        INFO    controllers.ScaledObject        Creating a new HPA      {"ScaledObject.Namespace": "echo", "ScaledObject.Name": "kafka-kbxtechogrpc-sample", "HPA.Namespace": "echo", "HPA.Name": "keda-hpa-kafka-kbxtechogrpc-sample"}

KEDA Version

2.4.0

Kubernetes Version

1.20

Platform

Amazon Web Services

Scaler Details

Kafka

Anything else?

Here is the yaml I was using to test with. We only have TLS on the server so I only include the CA data.

apiVersion: v1
kind: Secret
metadata:
  name: keda-kafka-secrets
  namespace: echo
data:
  ca: "LS0t..."
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: keda-trigger-auth-kafka-credential
  namespace: echo
spec:
  secretTargetRef:
  - parameter: ca
    name: keda-kafka-secrets
    key: ca
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kafka-kbxtechogrpc-sample
  namespace: echo
spec:
  scaleTargetRef:
    name: kbxtechogrpc
    kind: Deployment
  pollingInterval: 30
  minReplicaCount: 2
  maxReplicaCount: 10
  triggers:
  - type: kafka
    metadata:
      bootstrapServers: "..."
      consumerGroup: connect-dl-sink-integrations-edi990request-raw-s3
      topic: integrations.edirouting.cdc.tenderack.v1
      lagThreshold: "5"
      offsetResetPolicy: latest
    authenticationRef:
      name: keda-trigger-auth-kafka-credential
zroubalik commented 3 years ago

Thanks for reporting, by chance are you able to post here the stacktrace?

bpotaczek commented 3 years ago

I didn't see a stacktrace. The logs just end right there when the pod restarts. If you have steps to get one though I'd be happy to.

zroubalik commented 3 years ago

Does kubectl logs <POD_NAME> --previous works for you?

bpotaczek commented 3 years ago

I got the same thing (in a slightly different order)

I0823 19:11:17.192874       1 request.go:655] Throttling request took 1.03937764s, request: GET:https://10.100.0.1:443/apis/ratelimit.solo.io/v1alpha1?timeout=32s
2021-08-23T19:11:20.346Z        INFO    controller-runtime.metrics      metrics server is starting to listen    {"addr": ":8080"}
2021-08-23T19:11:20.348Z        INFO    controllers.ScaledObject        Running on Kubernetes 1.20+     {"version": "v1.20.7-eks-d88609"}
2021-08-23T19:11:20.349Z        INFO    setup   Starting manager
2021-08-23T19:11:20.349Z        INFO    setup   KEDA Version: 2.4.0
2021-08-23T19:11:20.349Z        INFO    setup   Git Commit:
2021-08-23T19:11:20.349Z        INFO    setup   Go Version: go1.15.13
2021-08-23T19:11:20.349Z        INFO    setup   Go OS/Arch: linux/amd64
I0823 19:11:20.349267       1 leaderelection.go:243] attempting to acquire leader lease keda/operator.keda.sh...
2021-08-23T19:11:20.349Z        INFO    controller-runtime.manager      starting metrics server {"path": "/metrics"}
I0823 19:11:37.765990       1 leaderelection.go:253] successfully acquired lease keda/operator.keda.sh
2021-08-23T19:11:37.766Z        INFO    controller      Starting EventSource    {"reconcilerGroup": "keda.sh", "reconcilerKind": "ClusterTriggerAuthentication", "controller": "clustertriggerauthentication", "source": "kind source: /, Kind="}
2021-08-23T19:11:37.766Z        DEBUG   controller-runtime.manager.events       Normal  {"object": {"kind":"ConfigMap","namespace":"keda","name":"operator.keda.sh","uid":"1798d9da-8a44-4108-9233-02ad5ed62121","apiVersion":"v1","resourceVersion":"120418249"}, "reason": "LeaderElection", "message": "keda-operator-846b56df59-s257h_6c93590d-0422-4bea-b919-0cdddc268e18 became leader"}
2021-08-23T19:11:37.766Z        INFO    controller      Starting EventSource    {"reconcilerGroup": "keda.sh", "reconcilerKind": "ScaledObject", "controller": "scaledobject", "source": "kind source: /, Kind="}
2021-08-23T19:11:37.766Z        INFO    controller      Starting EventSource    {"reconcilerGroup": "keda.sh", "reconcilerKind": "ScaledJob", "controller": "scaledjob", "source": "kind source: /, Kind="}
2021-08-23T19:11:37.766Z        INFO    controller      Starting EventSource    {"reconcilerGroup": "keda.sh", "reconcilerKind": "TriggerAuthentication", "controller": "triggerauthentication", "source": "kind source: /, Kind="}
2021-08-23T19:11:37.866Z        INFO    controller      Starting Controller     {"reconcilerGroup": "keda.sh", "reconcilerKind": "ClusterTriggerAuthentication", "controller": "clustertriggerauthentication"}
2021-08-23T19:11:37.866Z        INFO    controller      Starting workers        {"reconcilerGroup": "keda.sh", "reconcilerKind": "ClusterTriggerAuthentication", "controller": "clustertriggerauthentication", "worker count": 1}
2021-08-23T19:11:37.866Z        INFO    controller      Starting EventSource    {"reconcilerGroup": "keda.sh", "reconcilerKind": "ScaledObject", "controller": "scaledobject", "source": "kind source: /, Kind="}
2021-08-23T19:11:37.867Z        INFO    controller      Starting Controller     {"reconcilerGroup": "keda.sh", "reconcilerKind": "TriggerAuthentication", "controller": "triggerauthentication"}
2021-08-23T19:11:37.867Z        INFO    controller      Starting Controller     {"reconcilerGroup": "keda.sh", "reconcilerKind": "ScaledJob", "controller": "scaledjob"}
2021-08-23T19:11:37.967Z        INFO    controller      Starting Controller     {"reconcilerGroup": "keda.sh", "reconcilerKind": "ScaledObject", "controller": "scaledobject"}
2021-08-23T19:11:37.967Z        INFO    controller      Starting workers        {"reconcilerGroup": "keda.sh", "reconcilerKind": "ScaledObject", "controller": "scaledobject", "worker count": 1}
2021-08-23T19:11:37.967Z        INFO    controllers.ScaledObject        Reconciling ScaledObject        {"ScaledObject.Namespace": "echo", "ScaledObject.Name": "kafka-kbxtechogrpc-sample"}
2021-08-23T19:11:37.967Z        DEBUG   controllers.ScaledObject        Parsed Group, Version, Kind, Resource   {"ScaledObject.Namespace": "echo", "ScaledObject.Name": "kafka-kbxtechogrpc-sample", "GVK": "apps/v1.Deployment", "Resource": "deployments"}
2021-08-23T19:11:37.967Z        INFO    controllers.ScaledObject        Creating a new HPA      {"ScaledObject.Namespace": "echo", "ScaledObject.Name": "kafka-kbxtechogrpc-sample", "HPA.Namespace": "echo", "HPA.Name": "keda-hpa-kafka-kbxtechogrpc-sample"}
2021-08-23T19:11:37.967Z        INFO    controller      Starting workers        {"reconcilerGroup": "keda.sh", "reconcilerKind": "TriggerAuthentication", "controller": "triggerauthentication", "worker count": 1}
2021-08-23T19:11:37.967Z        INFO    controller      Starting workers        {"reconcilerGroup": "keda.sh", "reconcilerKind": "ScaledJob", "controller": "scaledjob", "worker count": 1}
2021-08-23T19:11:37.967Z        DEBUG   controller      Successfully Reconciled {"reconcilerGroup": "keda.sh", "reconcilerKind": "TriggerAuthentication", "controller": "triggerauthentication", "name": "keda-trigger-auth-kafka-credential", "namespace": "echo"}
2021-08-23T19:11:37.967Z        DEBUG   controller-runtime.manager.events       Normal  {"object": {"kind":"TriggerAuthentication","namespace":"echo","name":"keda-trigger-auth-kafka-credential","uid":"32581dc7-4757-4aa5-b752-52da6ebfb170","apiVersion":"keda.sh/v1alpha1","resourceVersion":"120415389"}, "reason": "TriggerAuthenticationAdded", "message": "New TriggerAuthentication configured"}
zroubalik commented 3 years ago

Hmm, are you able to build KEDA locally? You can follow this guide, but omit step 1. - because you have KEDA already deployed. https://github.com/kedacore/keda/blob/main/BUILD.md#custom-keda-locally-outside-cluster

Basically you scale down the KEDA Operator pod deployed in the cluster to 0 and then will run KEDA locally from your laptop as a standard Go program.

kubectl scale deployment/keda-operator --replicas=0 -n keda

git clone https://github.com/kedacore/keda.git -b v2.4.0
cd keda
make run ARGS="--zap-log-level=debug"

This will start the operator and you should be able to see the full log on your laptop. Your kubeconfig should point to the cluster.

Thanks

bpotaczek commented 3 years ago

I was able to run it locally and I see in the logs it's connecting to kafka and I'm receiving data but it doesn't update the HPA for some reason so I get <unknown>/5 but at least it's running.

{"level":"debug","ts":1630090024.798146,"logger":"kafka_scaler","msg":"Group connect-kbxt-dl-sink-integrations-edi990request-raw-s3 has a lag of 14 for topic integrations.edirouting.cdc.tenderack.v1 and partition 0\n"}

When I try to run it on my cluster it just crashes at that same spot, no error log.

bpotaczek commented 3 years ago

If I run keda without any ScaledObjects on the cluster it runs and waits like it should. It's only after I create the Secret/TriggerAuth/ScaledObject above that is crashes. Still no additional information being logged.

zroubalik commented 3 years ago

I was able to run it locally and I see in the logs it's connecting to kafka and I'm receiving data but it doesn't update the HPA for some reason so I get <unknown>/5 but at least it's running.

KEDA Metrics Adapter is running correctly (in the cluster)?

bpotaczek commented 3 years ago

It looks like the pod was being OOM killed. Apparently the default was not enough. It was using about 500MB so I increased the limit to 1Gi and it ran fine. Thanks for the help.

zroubalik commented 3 years ago

That explains the missing stacktrace, thanks for letting us know.