kedacore / keda

KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes
https://keda.sh
Apache License 2.0
8.38k stars 1.06k forks source link

GCP Pub/Sub Scaler reports negative metric values #5774

Closed rc-bryanlinebaugh closed 5 months ago

rc-bryanlinebaugh commented 5 months ago

Report

We have configured the GCP Pub/Sub Scaler to scale our deployments based on the reported "SubscriptionSize" of the configured Pub/Sub Subscription. We have experienced frequent reporting of negative metric values by the underlying HorizontalPodAutoscaler resource for each ScaledObject.

The reported negative values seems to be having the adverse affect of incorrectly scaling down our deployments.

Examples:

image
NAME↑                                                  REFERENCE                                                TARGETS                              MINPODS                  MAXPODS                   REPLICAS                   AGE
keda-hpa-gcp-log-ingest-go-foo1234567                  Deployment/gcp-log-ingest-go-foo1234567                  -3795005m/50 (avg)                   50                       2500                      1194                       12d

Scaler Config:

spec:
  advanced:
    horizontalPodAutoscalerConfig:
      behavior:
        scaleDown:
          policies:
          - periodSeconds: 600
            type: Percent
            value: 5
          stabilizationWindowSeconds: 600
        scaleUp:
          policies:
          - periodSeconds: 60
            type: Percent
            value: 100
          stabilizationWindowSeconds: 100
  cooldownPeriod: 300
  fallback:
    failureThreshold: 5
    replicas: 200
  maxReplicaCount: 2500
  minReplicaCount: 50
  pollingInterval: 10
  scaleTargetRef:
    name: ...
  triggers:
  - authenticationRef:
      name: ...
    metadata:
      activationValue: "0"
      aggregation: sum
      mode: SubscriptionSize
      subscriptionName: ...
      value: "50"
    type: gcp-pubsub

Expected Behavior

There should not be any negative values reported for the HorizontalPodAutoscaler metric.

Actual Behavior

Observed the HorizontalPodAutoscaler consistently reporting negative values.

Steps to Reproduce the Problem

  1. Create a GCP Pub/Sub Topic and Subscription.
  2. Register GCP Pub/Sub Scaler with a similar configuration.
  3. Consistently publish messages to the Subscription.
  4. Observe metric values reported by the configured HPA resource.

Logs from KEDA operator

No response

KEDA Version

2.13.0

Kubernetes Version

1.28

Platform

Amazon Web Services

Scaler Details

GCP Pub/Sub

Anything else?

No response

JorTurFer commented 5 months ago

Hello Are you scrapping prometheus metrics from KEDA by change? Which is the value of this metric keda_scaler_metrics_value for those ScaledObjects?

rc-bryanlinebaugh commented 5 months ago

Hello,

I had to do a port-forward for the KEDA Operator in the same cluster, but I was able to retrieve what would have been scraped by Prometheus. For example, here's what a collection of our ScaledObject resources is reporting:

Screenshot 2024-05-02 at 10 54 53 AM

JorTurFer commented 5 months ago

Interesting information, so we are getting somehow negative values from Pub/Sub api somehow, maybe it's because the aggregation window or something so that we use is not correct. Could it be possible? Are you willing to take a look? This is the scaler code: https://github.com/kedacore/keda/blob/main/pkg/scalers/gcp_pubsub_scaler.go

This is the relevant part of scaler code (and other calls executed within it): https://github.com/kedacore/keda/blob/a16802261ed3f6ae589ca30945ef596853a549be/pkg/scalers/gcp_pubsub_scaler.go#L236-L256

rc-bryanlinebaugh commented 5 months ago

Not a Golang expert by any means, but I can take a look. It would be great if someone with more expertise could take a look as well.

JorTurFer commented 5 months ago

Let's see if there is any other folk willing to help here too :)

rc-bryanlinebaugh commented 5 months ago

@JorTurFer Thanks for the initial triage of this issue!

It looks like we were receiving negative values because we were mistakenly passing the aggregation function (sum) in our Scaler configuration. This is an issue because the SubscriptionSize metric we configured is not a Distribution type metric and not supported by the available aggregation methods. This is mentioned in a comment in the documented Scaler example. With the included aggregation, the clauses for the MQL query added here on the Gauge metric resulted in negative values (I don't fully understand why, but I was able to replicate it in Metrics Explorer).

Once we removed the aggregation parameter from our Scaler config, values were reported as expected.

JorTurFer commented 5 months ago

Nice to read that it's working well :smile: Thanks a lot for the feedback!

JoelDimbernat commented 3 months ago

We still have a problem without any aggregation function defined, we sometimes get -9223372036854775808m/100 (avg).

Apparently it also happens when using kafka on GCP #5730