kedacore / keda

KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes
https://keda.sh
Apache License 2.0
8.27k stars 1.05k forks source link

Event Hub - incorrect metrics values #5784

Closed Duri9292 closed 1 month ago

Duri9292 commented 4 months ago

Report

I'm getting incorrect values from the external metric. Sometimes the external metrics provide the correct values but most of the time the current values are way off.

Metrics examples: Here you can see that averageValue is 1040334m which does not make sense and it will trigger the maximum possible scaling.

currentMetrics:
  - external:
      current:
        averageValue: 1040334m
      metric:
        name: s0-azure-eventhub-onb
        selector:
          matchLabels:
            scaledobject.keda.sh/name: event-hub-scaler

From time to time the averageValue is more accurate and it looks more realistic.

   currentMetrics:
  - external:
      current:
        averageValue: "814"
      metric:
        name: s0-azure-eventhub-onb
        selector:
          matchLabels:
            scaledobject.keda.sh/name: event-hub-scaler

Here are the incoming messages metrics directly from Azure and as you can see we have usually an average of 100 incoming messages per minute. image

Expected Behavior

The Average values should be more consistent and showing the real values.

Actual Behavior

The current values are jumping from 600 to 580334m while the real average incoming message are usually around 100. We are processing approximately 22 000 messages per day so the average value like 580334m does not make any sense.

Steps to Reproduce the Problem

  1. Configure the azure-eventhub trigger
  2. Monitor the HPA average values

Logs from KEDA operator

2024-05-04T18:45:09Z    INFO    Reconciling ScaledObject    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"event-hub-scaler","namespace":"my-app-namespace"}, "namespace": "my-app-namespace", "name": "event-hub-scaler", "reconcileID": "eb42d4de-1c9f-4dce-b243-32901de7ce0e"}
2024-05-04T18:45:24Z    INFO    Reconciling ScaledObject    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"event-hub-scaler","namespace":"my-app-namespace"}, "namespace": "my-app-namespace", "name": "event-hub-scaler", "reconcileID": "f30d4ff1-8640-4cf8-8bc1-414fe92bd72c"}
2024-05-04T18:45:40Z    INFO    Reconciling ScaledObject    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"event-hub-scaler","namespace":"my-app-namespace"}, "namespace": "my-app-namespace", "name": "event-hub-scaler", "reconcileID": "5e95c24b-f0d5-4a0e-bf19-b437ea3b6d71"}
2024-05-04T18:45:55Z    INFO    Reconciling ScaledObject    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"event-hub-scaler","namespace":"my-app-namespace"}, "namespace": "my-app-namespace", "name": "event-hub-scaler", "reconcileID": "3b326319-9811-441b-899c-c4c712d4451c"}
2024-05-04T18:50:44Z    INFO    Reconciling ScaledObject    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"event-hub-scaler","namespace":"my-app-namespace"}, "namespace": "my-app-namespace", "name": "event-hub-scaler", "reconcileID": "9fd2fdf5-3de0-4f12-ba91-9733a41a2670"}
2024-05-04T18:51:00Z    INFO    Reconciling ScaledObject    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"event-hub-scaler","namespace":"my-app-namespace"}, "namespace": "my-app-namespace", "name": "event-hub-scaler", "reconcileID": "a6bf465f-fe21-472c-9059-bc16ddf56617"}
2024-05-04T18:53:51Z    INFO    Reconciling ScaledObject    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"event-hub-scaler","namespace":"my-app-namespace"}, "namespace": "my-app-namespace", "name": "event-hub-scaler", "reconcileID": "ec65a58d-8f68-418b-bca4-ee34a3a3f952"}
2024-05-04T18:55:08Z    INFO    Reconciling ScaledObject    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"event-hub-scaler","namespace":"my-app-namespace"}, "namespace": "my-app-namespace", "name": "event-hub-scaler", "reconcileID": "35cb23fa-6803-459a-a459-dd09beafb8b1"}
2024-05-04T18:55:24Z    INFO    Reconciling ScaledObject    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"event-hub-scaler","namespace":"my-app-namespace"}, "namespace": "my-app-namespace", "name": "event-hub-scaler", "reconcileID": "98d6080b-9c5e-4f14-a4b0-883f7325648c"}
2024-05-04T18:56:26Z    INFO    Reconciling ScaledObject    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"event-hub-scaler","namespace":"my-app-namespace"}, "namespace": "my-app-namespace", "name": "event-hub-scaler", "reconcileID": "43613e30-d318-410e-8633-d46cec379c31"}
2024-05-04T18:56:42Z    INFO    Reconciling ScaledObject    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"event-hub-scaler","namespace":"my-app-namespace"}, "namespace": "my-app-namespace", "name": "event-hub-scaler", "reconcileID": "f0ffdb33-b0a5-4bd1-82d7-1238e817f960"}
2024-05-04T18:56:57Z    INFO    Reconciling ScaledObject    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"event-hub-scaler","namespace":"my-app-namespace"}, "namespace": "my-app-namespace", "name": "event-hub-scaler", "reconcileID": "45100633-32b2-42fb-bd93-0831d15b4ac9"}
2024-05-04T18:57:13Z    INFO    Reconciling ScaledObject    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"event-hub-scaler","namespace":"my-app-namespace"}, "namespace": "my-app-namespace", "name": "event-hub-scaler", "reconcileID": "dec7edf4-7580-4c96-91ed-8c6766ea98c3"}
2024-05-04T18:57:28Z    INFO    Reconciling ScaledObject    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"event-hub-scaler","namespace":"my-app-namespace"}, "namespace": "my-app-namespace", "name": "event-hub-scaler", "reconcileID": "6eec8aec-30e1-44e9-8aec-fca3dc955665"}

KEDA Version

2.11.2

Kubernetes Version

1.28

Platform

Microsoft Azure

Scaler Details

azure-eventhub

Anything else?

No response

JorTurFer commented 4 months ago

Hello, Which is the problem exactly? The value using m (1040334m)? In K8s context 'm' means mili and it's used when the value is a float number because k8s doesn't use float numbers. When you see 1040334m it means 1040,334. In the same way, jumps between 600 and 580334m are quite normal because it's jumping from 600 and 580,334

Duri9292 commented 4 months ago

Hello @JorTurFer thank you for your quick response. The issue is that once the number is in mili scale the HPA is always scaling to maximum possible replica number. When the average value is non float number the scaler is decreasing the replicas or scaling as expected.

e.g. current average number: 1741 replica:1

image

current average number: 580334m replica:3 I configured the trigger value to 5000 and the scaler is always active which should not be in this case.

scaled_to_max

JorTurFer commented 4 months ago

Are you scrapping prometheus metric generated by KEDA? I almost sure that you have a peak which justifies the scaling out, as you said, you're under the threshold. The only option for that behaviour without a peak is that you have changed the target value and the HPA controller is still during the scaling cooldown (300 after the last scaling out)

Duri9292 commented 4 months ago

The thing is that we turned down Event Hub data ingestion for the last 24 hrs which means that we are getting 0 incoming messages. (we wanted to test scaling to 0) So there are no peaks. Even a value like 1741 does not make very sense but if it is calculating the average value for the last few days it can be relevant. I will be monitoring the behavior once we enable data ingestion again.

Below is a graph for incoming messages to Event Hub (past 48hrs) Data granularity: 5 minutes image

JorTurFer commented 4 months ago

No no, it doesn't use the average value at all. KEDA uses the current value, so if it's 0 in the eventhub and you don't see 0 in KEDA, it can be a misconfiguration or a bug. Do you see any value different from 0? You can manually query the metric value and check what KEDA returns: https://keda.sh/docs/2.14/operate/metrics-server/#querying-metrics-exposed-by-keda-metrics-server

Duri9292 commented 3 months ago

Hello @JorTurFer to answer your question "Do you see any value different from 0?" yes, even when even hub was turned off the HPA had always some number in meterics.

We enabled the event hub again and for some reason, we stopped getting float values, and scaling is working as expected. Or at least I did not catch any float number during my observation since there is no history of this value I cannot confirm. But it seems that once the float values stopped occurring the scaling is ok.

No no, it doesn't use the average value at all.

The documentation mentions that these are average values, we are using default. If that is not true than sorry I must missed it.

image

However, the values from metrics still do not match values from event hub metrics. image

Event Hub (sum) for the past 30 min image

Event Hub (avg) for past 24 hrs image

JorTurFer commented 3 months ago

The documentation mentions that these are average values, we are using default. If that is not true than sorry I must missed it.

mb, I understood that KEDA recovers the average value from the eventhub. You are right and k8s workload will be scaled based on the average value calculated using the instant eventhub value

JorTurFer commented 3 months ago

We enabled the event hub again and for some reason, we stopped getting float values, and scaling is working as expected. Or at least I did not catch any float number during my observation since there is no history of this value I cannot confirm. But it seems that once the float values stopped occurring the scaling is ok.

Float values are correct and they can happen, if eventhub returns 7 and you have 4 pods, you'll have a float value in average

stale[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

stale[bot] commented 1 month ago

This issue has been automatically closed due to inactivity.