prometheus / cloudwatch_exporter

Metrics exporter for Amazon AWS CloudWatch
Apache License 2.0
903 stars 324 forks source link

Incorrect Configuration? #432

Closed mattystevenson closed 2 years ago

mattystevenson commented 2 years ago

Hi there,

Thanks for this! I'm trying to get it working to get data for Amazon GameLift which I think would be a welcomed alternative to CloudWatch. I've gotten things seemingly set up but I am not seeing any metrics. Here is what I get from the metrics screen for my CloudWatch Exporter.

# HELP cloudwatch_requests_total API requests made to CloudWatch
# TYPE cloudwatch_requests_total counter
cloudwatch_requests_total{action="listMetrics",namespace="AWS/Redshift",} 611.0
cloudwatch_requests_total{action="listMetrics",namespace="AWS/ElastiCache",} 470.0
cloudwatch_requests_total{action="getMetricStatistics",namespace="AWS/GameLift",} 47.0
cloudwatch_requests_total{action="getMetricStatistics",namespace="AWS/S3",} 376.0
cloudwatch_requests_total{action="listMetrics",namespace="AWS/S3",} 94.0
cloudwatch_requests_total{action="listMetrics",namespace="AWS/GameLift",} 94.0
cloudwatch_requests_total{action="listMetrics",namespace="AWS/ELB",} 235.0
cloudwatch_requests_total{action="listMetrics",namespace="AWS/CloudFront",} 282.0
# HELP aws_s3_bucket_size_bytes_average CloudWatch metric AWS/S3 BucketSizeBytes Dimensions: [BucketName, StorageType] Statistic: Average Unit: Bytes
# TYPE aws_s3_bucket_size_bytes_average gauge
aws_s3_bucket_size_bytes_average{job="aws_s3",instance="",storage_type="StandardStorage",bucket_name="glsbe1stezmatt2",} 228661.0 1654314900000
aws_s3_bucket_size_bytes_average{job="aws_s3",instance="",storage_type="StandardStorage",bucket_name="glsbe1stezmatt",} 228660.0 1654314900000
aws_s3_bucket_size_bytes_average{job="aws_s3",instance="",storage_type="StandardStorage",bucket_name="glsbe1stezmatt1",} 228661.0 1654314900000
aws_s3_bucket_size_bytes_average{job="aws_s3",instance="",storage_type="StandardStorage",bucket_name="cloudtrail-awslogs-604060733806-d4tnpg08-isengard-do-not-delete",} 1.52625672E8 1654314900000
# HELP aws_s3_number_of_objects_average CloudWatch metric AWS/S3 NumberOfObjects Dimensions: [BucketName, StorageType] Statistic: Average Unit: Count
# TYPE aws_s3_number_of_objects_average gauge
aws_s3_number_of_objects_average{job="aws_s3",instance="",storage_type="AllStorageTypes",bucket_name="XXXXXX",} 1.0 1654314900000
aws_s3_number_of_objects_average{job="aws_s3",instance="",storage_type="AllStorageTypes",bucket_name="XXXXXX",} 1.0 1654314900000
aws_s3_number_of_objects_average{job="aws_s3",instance="",storage_type="AllStorageTypes",bucket_name="XXXXXX",} 1.0 1654314900000
aws_s3_number_of_objects_average{job="aws_s3",instance="",storage_type="AllStorageTypes",bucket_name="XXXXXX",} 97749.0 1654314900000
# HELP cloudwatch_exporter_scrape_duration_seconds Time this CloudWatch scrape took, in seconds.
# TYPE cloudwatch_exporter_scrape_duration_seconds gauge
cloudwatch_exporter_scrape_duration_seconds 1.261845846
# HELP cloudwatch_exporter_scrape_error Non-zero if this scrape failed.
# TYPE cloudwatch_exporter_scrape_error gauge
cloudwatch_exporter_scrape_error 0.0

I would expect to start to see some data flowing similar to your example here:

# HELP aws_elb_request_count_sum CloudWatch metric AWS/ELB RequestCount Dimensions: ["AvailabilityZone","LoadBalancerName"] Statistic: Sum Unit: Count
# TYPE aws_elb_request_count_sum gauge
aws_elb_request_count_sum{job="aws_elb",instance="",load_balancer_name="mylb",availability_zone="eu-west-1c",} 42.0
aws_elb_request_count_sum{job="aws_elb",instance="",load_balancer_name="myotherlb",availability_zone="eu-west-1c",} 7.0

Yet even after playing with the config a bit, I am not getting anything like that. CloudWatch is showing these metrics just fine. Here is what I am using in my config:

region: us-east-1
metrics:
- aws_namespace: AWS/GameLift
  aws_metric_name: ActiveInstances
  aws_dimensions: [Location]

- aws_namespace: AWS/GameLift
  aws_metric_name: CurrentPlayerSessions
  aws_dimensions: [Location]
  aws_statistics: [Minimum]

- aws_namespace: AWS/GameLift
  aws_metric_name: ActiveGameSessions
  aws_dimension: [Location]

Here are the GameLift Metrics for reference.

I've tried to debug to no avail. Any thoughts on what I might be doing wrong? Thanks in advance!

or-shachar commented 2 years ago

Thanks for reporting this! :)

I can think of two possible reasons for this:

  1. No metrics were reported in the given time frame (less likely, controlled by "range_seconds" or "period seconds")
  2. Problem with the dimensions (see next section)

The stages for scraping cloudwatch works as follows - for each metric rule in the list:

  1. Get a list of dimensions "permutations" to use for scraping (here)
  2. Query cloudwatch for each permutation of the dimension

In case all you specify is aws_dimension but no selection this is the method that gets the permutations.

Basically - it runs ListMetrics API. The equvalent aws cli is:

aws cloudwatch list-metrics --namespace $namespace --metric-name $metricName 

What are the results of this command when you run this from CLI with your configuration? example:

aws cloudwatch list-metrics --namespace AWS/GameLift
 --metric-name ActiveGameSessions --dimensions Location

side note: @matthiasr - I wonder if it makes sense to print debug / warning log in case the final dimension list is empty - like right here. WDYT?

matthiasr commented 2 years ago

Yes that should be a warning I think. How often would it be printed?

or-shachar commented 2 years ago

Good question... my concern is that it'll be printed on every scrape for each faulting rule. But a good user should spot that warning and fix it...

matthiasr commented 2 years ago

That wouldn't be too terrible, less a log line a second is tolerable IMO. I just don't want people to learn about their problem from the Cloudwatch Logs billing alert :)

On Wed, Jun 8, 2022, 14:46 Or Shachar @.***> wrote:

Good question... my concern is that it'll be printed on every scrape for each faulting rule. But a good user should spot that warning and fix it...

— Reply to this email directly, view it on GitHub https://github.com/prometheus/cloudwatch_exporter/issues/432#issuecomment-1149869035, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABAEBXAE5XJX5OB5I2EIFDVOCIZPANCNFSM5X6EPSKA . You are receiving this because you were mentioned.Message ID: @.***>

mattystevenson commented 2 years ago

Hi there. Thanks for taking a look at this.

I want to make sure I am making the CLI call as instructed. You said with my configuration and said to call

aws list-metrics --namespace AWS/GameLift --metric-name ActiveGameSessions --dimensions Location

but wouldn't it need to be aws _cloudwatch_ list-metrics --namespace AWS/GameLift --metric-name ActiveGameSessions --dimensions Location

Please let me know if I am confused here. Either way when I make the latter call above (aws cloudwatch...) with the dimensions flag and location I do get an error while removing the dimensions flag works.

`aws cloudwatch list-metrics --namespace AWS/GameLift --metric-name ActiveGameSessions --dimensions Location

Error parsing parameter '--dimensions': Expected: '=', received: 'EOF' for input:
Locations`

Again I could be off on your instruction here. New to much of this. I tried removing the flag from my config file and the results did not change though. Please let me know if I might be off somewhere and thanks again.

or-shachar commented 2 years ago

Pardon - this is the right aws CLI command:

aws cloudwatch list-metrics --namespace AWS/GameLift --metric-name ActiveGameSessions

(Updated the original comment)

mattystevenson commented 2 years ago

Thanks, I thought so. Yes I do return results when running that command. However I edited the config file to remove that flag but I am still getting the same result as mentioned in my original post. I am going to try to rebuild my whole config today to see if there is something I might have missed but any other suggestions are certainly welcome.

or-shachar commented 2 years ago

If you can post here the output of running the CLI command - it can be really helpful. It doesn't expose anything proprietary anyways.

mattystevenson commented 2 years ago

Of course. Here you go.

{
    "Metrics": [
        {
            "Namespace": "AWS/GameLift",
            "Dimensions": [
                {
                    "Name": "MetricGroups",
                    "Value": "default"
                }
            ],
            "MetricName": "ActiveGameSessions"
        },
        {
            "Namespace": "AWS/GameLift",
{
    "Metrics": [
        {
            "Namespace": "AWS/GameLift",
            "MetricName": "ActiveGameSessions",
            "Dimensions": [
                {
                    "Name": "MetricGroups",
                    "Value": "default"
                },
                {
                    "Name": "Location",
                    "Value": "us-west-2"
                }
            ]
        },
        {
            "Namespace": "AWS/GameLift",
            "MetricName": "ActiveGameSessions",
            "Dimensions": [
                {
                    "Name": "FleetId",
                    "Value": "fleet-ff48bb26-1228-47a7-8a8b-f086ea8c7a6d"
                },
                {
                    "Name": "Location",
                    "Value": "us-west-2"
                }
            ]
        },
        {
            "Namespace": "AWS/GameLift",
            "MetricName": "ActiveGameSessions",
            "Dimensions": [
                {
                    "Name": "FleetId",
                    "Value": "fleet-ff48bb26-1228-47a7-8a8b-f086ea8c7a6d"
                }
            ]
        },
        {
            "Namespace": "AWS/GameLift",
            "MetricName": "ActiveGameSessions",
            "Dimensions": [
                {
                    "Name": "MetricGroups",
                    "Value": "default"
                }
            ]
        }
    ]
}
or-shachar commented 2 years ago

So it seems like you don't have a metric that has single dimension with Location. You have one with [Location,FleetId] and one with [Location,MetricGroups].

mattystevenson commented 2 years ago

Yes. Seems so. Based on what you’re seeing do you believe removing that flag altogether should resolve what I am seeing? I had not had the chance to rebuild my config yet but plan to tonight. In my existing config I am still seeing the same issue.

On Sun, Jun 12, 2022 at 5:16 AM Or Shachar @.***> wrote:

So it seems like you don't have a metric that has single dimension with Location. You have one with [Location,FleetId] and one with [Location,MetricGroups].

— Reply to this email directly, view it on GitHub https://github.com/prometheus/cloudwatch_exporter/issues/432#issuecomment-1153146720, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACK6O433ANHTSOR2CW6ISVTVOXIIHANCNFSM5X6EPSKA . You are receiving this because you authored the thread.Message ID: @.***>

or-shachar commented 2 years ago

Try this for instance:

- aws_namespace: AWS/GameLift
  aws_metric_name: ActiveGameSessions
  aws_dimension: [Location, MetricGroups]

or this:

- aws_namespace: AWS/GameLift
  aws_metric_name: ActiveGameSessions
  aws_dimension: [Location, FleetId]
mattystevenson commented 2 years ago

Thanks very much. Will give this a try as soon as possible.

mattystevenson commented 2 years ago

Hey there. That did the trick! Thanks so much for all the help! My project is nearly complete. Really appreciate the assistance.

I have an off topic question if you don't mind. Are you aware of Prometheus being able to set a static target of an IP range or CIDR? No worries if you can't answer, just been looking around at many docs and can't find a clear answer. Just trying to monitor a set of instances in a small CIDR along with the CW Exporter config.

Again no worries on the above since it is out of scope. Thanks again and feel free to close.

matthiasr commented 2 years ago

That question would be better on the users mailing list or one of the other community channels. That being said, I don't think there is a way, but you can generate a file with all the IPs and use file SD.