influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.88k stars 5.6k forks source link

Permission issue with ec2 metadata processor #10605

Open vipinvkmenon opened 2 years ago

vipinvkmenon commented 2 years ago

Relevant telegraf.conf

[agent]
  omit_hostname = true
[[inputs.cloudwatch]]
    name_prefix = "vipin."
    region = "<>"
    access_key = "<>"
    secret_key = "<>"
    period = "1m"
    delay = "5m"
    interval = "5m"
    recently_active = "PT3H"
    namespaces = ["AWS/NATGateway"]
    namespace = "AWS/NATGateway"
    statistic_include = [ "sum" ]
    [[inputs.cloudwatch.metrics]]
        names = ["ActiveConnectionCount", "BytesInFromDestination"]
        [[inputs.cloudwatch.metrics.dimensions]]
            name = "NatGatewayId"
            value = "*"
[[processors.aws_ec2]]
  ec2_tags = ["name"]
  ordered = false
[[outputs.file]]
  files = ["stdout"]

Logs from Telegraf

./telegraf --config telegraf.config --debug
2022-02-08T07:03:20Z I! Starting Telegraf 1.21.2
2022-02-08T07:03:20Z I! Loaded inputs: cloudwatch
2022-02-08T07:03:20Z I! Loaded aggregators:
2022-02-08T07:03:20Z I! Loaded processors: aws_ec2
2022-02-08T07:03:20Z I! Loaded outputs: file
2022-02-08T07:03:20Z I! Tags enabled:
2022-02-08T07:03:20Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"", Flush Interval:10s
2022-02-08T07:03:20Z D! [agent] Initializing plugins
2022-02-08T07:03:20Z D! [processors.aws_ec2] Initializing AWS EC2 Processor
2022-02-08T07:03:20Z D! [processors.aws_ec2] Initializing AWS EC2 Processor
2022-02-08T07:03:20Z D! [agent] Connecting outputs
2022-02-08T07:03:20Z D! [agent] Attempting connection to [outputs.file]
2022-02-08T07:03:20Z D! [agent] Successfully connected to outputs.file
2022-02-08T07:03:20Z E! [telegraf] Error running agent: starting processor processors.aws_ec2: error calling DescribeTags: operation error EC2: DescribeTags, failed to sign request: failed to retrieve credentials: no EC2 IMDS role found, operation error ec2imds: GetMetadata, http response error StatusCode: 404, request to EC2 IMDS failed

System info

Telegraf 1.21.2, Ubuntu 18.04.6 LTS

Docker

No response

Steps to reproduce

  1. Create the necessary AWS user
  2. Create the necessary telegraf config
  3. run telegraf --config telegraf_config_file.conf ...

Expected behavior

Metrics and respective tags

Actual behavior

Throws exceptions

error calling DescribeTags: operation error EC2: DescribeTags, failed to sign request: failed to retrieve credentials: no EC2 IMDS role found, operation error ec2imds: GetMetadata

Additional info

The exception says permission issues. However, The user has the following policy attached:

"Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "elasticloadbalancing:DescribeLoadBalancers",
                "ec2:DescribeInstances",
                "tag:GetResources",
                "cloudwatch:GetMetricData",
                "ec2:DescribeTags",
                "ec2:DescribeRegions",
                "rds:DescribeDBInstances",
                "elasticloadbalancing:DescribeTargetGroups",
                "ec2:DescribeNatGateways",
                "cloudwatch:GetMetricStatistics",
                "cloudwatch:ListMetrics",
                "cloudwatch:DescribeAlarms"

            ],
            "Resource": "*"
        }
    ]

and from the AWS console dashboard image

Also, if I try to run the command from the AWS CLI:

aws ec2 describe-tags --filters "Name=resource-type,Values=natgateway"

It works as:

{
    "Tags": [
        {
            "Key": "DeploymentName",
            "ResourceId": "nat-xxx",
            "ResourceType": "natgateway",
            "Value": "xxx"
        },
        .
        .
        .

Indicating that the user has the right permissions to read Tags and also pull metrics.

Note: If the processor is removed then all metrics come through as expected from the cloudwatch plugin (without meta data tags ofcourse)

powersj commented 2 years ago

Hmm

So the error comes from the processor Start function when it does a DryRun attempt at DescribeTags. As you said your CLI option works, but this would stem from a permissions issue. I do want to verify, that you are using the same access key and secret key? And that you also do not have any other types of credentials set up for AWS?

vipinvkmenon commented 2 years ago

Hi, Thank you for your quick support :)

Yes the credentials used in the AWS CLI are the same as the ones used in telegraf..... The CLI was simply configured as:

aws configure set default.region 'us-east-1-same-as-telegraf'; aws configure set aws_access_key_id '<same-as-telegraf>'; aws configure set aws_secret_access_key '<same-as-in-telegraf>'

And thereafter running the command:

aws ec2 describe-tags --filters "Name=resource-type,Values=natgateway"
{
    "Tags": [
        {
            "Key": "DeploymentName",
            "ResourceId": "nat-006b033d708fcda58",
            "ResourceType": "natgateway",
            "Value": "trial"
        },

As can be seen, it works. Even tried from an external machine to ensure there are no additional securities in place. As noted, with the given user, telegraf can pull metrics just not metadata. (So like you rightly said...processor issue)

maybe as a context is, currently we have another approach of pulling these metrics and metadata using a Clojure library called amazonica. This approach using amazonica works in pulling metrics and metadata using the same above mentioned credentials. It's just that, we'd slowly like to phase it out with the more robust, easier and friendlier telegraf which is what we use for everything else.