kedacore / keda

KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes
https://keda.sh
Apache License 2.0
8.36k stars 1.06k forks source link

Could not find stackdriver metric with query fetch pubsub_subscription - Google Cloud Platform‎ Pub/Sub #5855

Closed rcng6514 closed 4 weeks ago

rcng6514 commented 3 months ago

Report

We seem to have the same problem as #5452 we believe since upgrading to 2.14.0. Infrequently Keda throwing that it cannot find a metric that matches that filter. GCP audit logs show no failed auth or perms issues. In a 7 day window we observed 863 instances of this error

Expected Behavior

Metric query returned consistently

Actual Behavior

Metric query inconsistently returned with the following error: could not find stackdriver metric with query fetch pubsub_subscription

Steps to Reproduce the Problem

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  annotations:
    meta.helm.sh/release-name: <name>
    meta.helm.sh/release-namespace: <namespace>
  labels:
    app.kubernetes.io/managed-by: Helm
    helm.toolkit.fluxcd.io/name: <name>
    helm.toolkit.fluxcd.io/namespace: <namespace>
    scaledobject.keda.sh/name: <name>
  name: <name>
  namespace: <namespace>
spec:
  cooldownPeriod: 10
  maxReplicaCount: 3
  minReplicaCount: 1
  pollingInterval: 1
  scaleTargetRef:
    name: <target>
  triggers:
  - authenticationRef:
      name: <auth>
    metadata:
      mode: SubscriptionSize
      subscriptionName: projects/<project>/subscriptions/<sub>
      value: "50000"
    type: gcp-pubsub

Logs from KEDA operator

2024-05-30T18:04:43Z    ERROR   gcp_pub_sub_scaler  error getting metric    {"type": "ScaledObject", "namespace": "blah", "name": "blah", "metricType": "pubsub.googleapis.com/subscription/num_undelivered_messages", "error": "could not find stackdriver metric with query fetch pubsub_subscription | metric 'pubsub.googleapis.com/subscription/num_undelivered_messages' | filter (resource.project_id == 'blah' && resource.subscription_id == 'blah') | within 2m"}

KEDA Version

2.14.0

Kubernetes Version

1.28

Platform

Google Cloud

Scaler Details

Google Cloud Platform‎ Pub/Sub

Anything else?

Poll interval if reduced to 5 seconds reduces errors seen but does not eliminate them

Caislear commented 3 months ago

Just noting here as these two issues may be correlated as this issue sounds somewhat similar to what I am experiencing https://github.com/kedacore/keda/issues/5896

Out of interest do your subscriptions regularly have no messages in them?

rcng6514 commented 3 months ago

Thanks @Caislear , we've got two envs that are impacted by this. One has a constant stream of messages in queue that rarely hit zero, the other more often hits zero messages in queue. Both are impacted by this bug but interestingly the higher volume environment experiences more of these errors

stale[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

JorTurFer commented 1 month ago

There was a bug in v2.13 that reduced unexpectedly the aggregation period but if was solved in v2.14. In v2.15 the time window supports custom horizons -> https://github.com/kedacore/keda/issues/5429 I think that you could have some periods without metrics that are treated as errors and probably increasing the window it can help. We added also the option to set a custom value if metric not available as part of https://github.com/kedacore/keda/pull/5897

stale[bot] commented 4 weeks ago

This issue has been automatically closed due to inactivity.