grafana / alloy

OpenTelemetry Collector distribution with programmable pipelines
https://grafana.com/oss/alloy
Apache License 2.0
1.28k stars 174 forks source link

Enrich prometheus metrics with IMDS info #480

Open jkroepke opened 1 year ago

jkroepke commented 1 year ago

Hi,

I'm write a design principal to run Grafana Agent on our cloud infrastructure. The plan is to run the Agent on each machine and push them to a central Prometheus stack

The configuration of the agent is done with Remote Agent Management.

On each machine, the azure IDMS is available. There is any chance to enrich all the metrics with certain labels before pushing them to prometheus?

mattdurham commented 1 year ago

This sounds like a good use case for Flow Component being able to grab data and inject it as labels. It is unlikely we are to do this with static mode since injecting new changes in its pipeline is painful.

tpaschalis commented 1 year ago

Do all machines have access to the IDMS API, and is the API address available to your machines?

I'm wondering whether you can use Flow mode and the remote.http component to do something similar.

remote.http "idms" {
  url = "IDMS_API_ENDPOINT"
}

prometheus.remote_write "default" {
  external_labels = {"mycustomlabel" = json_decode(remote.http.idms.content)["compute"]["sku"]}
  endpoint {
    url = "PROMETHEUS_URL"

    basic_auth {
      username = "user"
      password = "password"
    }
  }
}

I'm not knowledgeable about the Azure-specific APIs but according to the docs it seems like there's two preconditions a) Don't use a proxy. b) Contain a Metadata: true header.

The first one depends on your deployment, but the second one requires updating the remote.http component to be able to inject headers. Do you think it'd work?

jkroepke commented 1 year ago

@tpaschalis

Do all machines have access to the IDMS API, and is the API address available to your machines? Yes. By default, all machines has access to IDMS API in AWS, Azure and GCP. As I know, access can be disabled on AWS.

Yes, it can work. Injecting custom http header is mandatory for Azure and GCP.

remote.http "idms" {
  url = "http://169.254.169.254/metadata/instance/compute/resourceId?api-version=2021-02-01&format=text"
}

prometheus.remote_write "default" {
  external_labels = {"resourceID" = remote.http.idms.content}
  endpoint {
    url = "http://localhost:9090/api/v1/write"
  }
}
image

HTTP 400, because the http header was not present.


However, on AWS it can be quite complex, but still possible.

For AWS IMDSv2, you have to fetch a token first TOKEN=curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600" and pass it to the IMDS query call. curl -H "X-aws-ec2-metadata-token: $TOKEN" -v http://169.254.169.254/latest/meta-data/

ref: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-metadata-v2-how-it-works.html

For a full coverage, the remote.http component must support custom http header and custom http methods. Then It could work and it would we a generic approach that could always work.

tpaschalis commented 1 year ago

Hey there @jkroepke! Thanks for the initiative on this.

Have you tried retrieving IMDS information after grafana/agent#3530 was merged with the updates to remote.http? From what I saw in the docs, the response comes as json so you'll have to use the json_decode() stdlib function and then you should be able to read any value you'd like!

jkroepke commented 1 year ago

Hey @tpaschalis I have not tried yet (vac time) but I pretty sure that it work. For Azure IMDS, I can request format=text to avoid any json_encode related actions.

You could close this, if you want. But please keep in mind, on AWS with IMDS v2 it might be tough to use remote.http agaist that.

PeeterTomusk commented 4 months ago

For AWS EC2 IMDSv2

/// Get IMDSv2 token
remote.http "imdsv2_token" {
  url = "http://169.254.169.254/latest/api/token"
  method = "PUT"
  headers = {
    "X-aws-ec2-metadata-token-ttl-seconds" = "21600",
  }
  poll_frequency = "21600s"
}

remote.http "ami_id" {
  url = "http://169.254.169.254/latest/meta-data/ami-id"
  headers = {
    "X-aws-ec2-metadata-token" = remote.http.imdsv2_token.content,
  }
  poll_frequency = "24h"
}

...
/// Send logs to Loki
loki.write "logs" {
  ...
  external_labels = {
    ami_id = remote.http.ami_id.content,
  }
}

It's a bit ugly, but it works!