nerdswords / yet-another-cloudwatch-exporter

Prometheus exporter for AWS CloudWatch - Discovers services through AWS tags, gets CloudWatch metrics data and provides them as Prometheus metrics with AWS tags as labels
Apache License 2.0
978 stars 333 forks source link

[BUG] Attempting to retrieve AWS/ApplicationELB metrics causes errors #1490

Open mthemis-provenir opened 3 months ago

mthemis-provenir commented 3 months ago

Is there an existing issue for this?

YACE version

v0.61.2

Config file

          apiVersion: v1alpha1
          sts-region: eu-west-1
          discovery:
            exportedTagsOnMetrics:
              AWS/ApplicationELB:
                - environment
                - elbv2.k8s.aws/cluster
             jobs:
            - type: AWS/ApplicationELB
              regions:
                - {{ values.region }}
              searchTags:
                - key: environment
                  value: ^({{ values.env }})$
              metrics:
                - name: HealthyHostCount
                  statistics:
                    - Average
                - name: UnHealthyHostCount
                  statistics:
                    - Average
                - name: HTTPCode_Target_2XX_Count
                  statistics:
                    - Sum
                - name: HTTPCode_Target_3XX_Count
                  statistics:
                    - Sum
                - name: HTTPCode_Target_4XX_Count
                  statistics:
                    - Sum
                - name: HTTPCode_Target_5XX_Count
                  statistics:
                    - Sum
                - name: TargetResponseTime
                  statistics:
                    - Average
                    - pNN.NN

Current Behavior

This error is generated. The metric names and statistics are valid as per https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-cloudwatch-metrics.html.

{
    "caller": "client.go:133",
    "err": "ValidationError: The value for parameter MetricDataQueries.member.12.MetricStat.Stat is not a valid statistic.\nThe value for parameter MetricDataQueries.member.14.MetricStat.Stat is not a valid statistic.\nThe value for parameter MetricDataQueries.member.16.MetricStat.Stat is not a valid statistic.\nThe value for parameter MetricDataQueries.member.18.MetricStat.Stat is not a valid statistic.\nThe value for parameter MetricDataQueries.member.20.MetricStat.Stat is not a valid statistic.\nThe value for parameter MetricDataQueries.member.22.MetricStat.Stat is not a valid statistic.\nThe value for parameter MetricDataQueries.member.24.MetricStat.Stat is not a valid statistic.\nThe value for parameter MetricDataQueries.member.26.MetricStat.Stat is not a valid statistic.\nThe value for parameter MetricDataQueries.member.28.MetricStat.Stat is not a valid statistic.\nThe value for parameter MetricDataQueries.member.30.MetricStat.Stat is not a valid statistic.\n\tstatus code: 400, request id: xxxx",
    "level": "error",
    "msg": "GetMetricData error",
    "ts": "2024-08-02T14:15:03.480049065Z",
    "version": "v0.61.2"
}

Expected Behavior

Metrics retrieved as expected.

Steps To Reproduce

No response

Anything else?

No response

kgeckhart commented 2 months ago

The errors you are receiving are directly from CloudWatch APIs due to the use of pNN.NN for TargetResponseTime. The docs you are referencing indicate percentile statistics are the most useful and indicate you can use any percentile you wish. You need to define the percentiles according to CloudWatch which would be in the form of p99.99, p95, etc. You can see more examples in the AWS docs for statistics, https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/US_SingleMetricPerInstance.html, step 6 shows examples and explains it.