A question about how to integrate Grafana Agent with any external metrics storage system

amseager commented 1 year ago

This is more a question about the documentation. I think it's not quite clear of how the metrics that have been polled by Grafana Agent are delivered to the external metrics storage (Prometheus/Grafana Mimir/Grafana Cloud Metrics). I can only see the "pushing" approach, so you can use remote_write in this case, and your external metrics storage should be set up as a remote-write receiver. However, I've seen a lot of articles in which this was considered a bad practice - it was advised to always use the "polling" method unless you can't poll from the metrics source (which is not relevant to Grafana Agent that has an API for this). I believe this was even mentioned in the Prometheus docs. So, what's the right way to do this? If it depends on your preference or the Grafana Agent doesn't support some ways at all (which can be intended or not), I think it should be mentioned in the readme (in form of a comparison table of these strategies or smth like that).

rfratto commented 1 year ago

Hello!

Grafana Agent does support pulling, just in the way Prometheus does: you use scrape_configs to pull metrics from targets:

server:
  log_level: info

metrics:
  global:
    scrape_interval: 1m
  configs:
    - name: test
      host_filter: false
      scrape_configs:
        - job_name: local_scrape
          static_configs:
            - targets: ['127.0.0.1:12345']
              labels:
                cluster: 'localhost'
      remote_write:
        - url: http://localhost:9009/api/prom/push

remote_write forwards pulled metrics from the agent to a remote system.

The pulling/pushing debate normally refers to how telemetry data from an application/system under observation is collected. Pulling means that an agent, collector, or a database reaches out to the application and requests telemetry data from it. Pushing means that the application pushes its telemetry data directly to an agent, collector, or database.

remote_write sits outside of that concept; it's about delivering the pulled metrics after they've been retrieved. It uses the Prometheus remote write protocol, which sends Prometheus-formatted metrics to a system which is capable of reading the remote write protocol.

The Prometheus ecosystem is built around pull-based metrics. There's been a bit of resurgence recently in pushed-based metrics, which is being re-popularized by the OpenTelemetry project.

Given the popularity of both OpenTelemetry and Prometheus, I don't think it's possible to say if either the pull or push method is the best practice; it depends on your environment, and which tradeoffs you're willing to make.

amseager commented 1 year ago

Hello @rfratto, Very big thanks for the explanation, that's exactly what I wanted to know.

To conclude this topic, a small question if you don't mind (I think this is the only thing which left uncovered): is it possible for the Grafana Agent to receive metrics over grpc/http that are pushed by e.g. java app with the OTLP instrumentation agent, as we can do in case of collecting traces?

traces:
  configs:
  - name: default
    receivers:
      otlp:
        protocols:
          grpc:
          http:

Looks like this is not possible to have the same block under the metrics: section. The example in the docs shows only how to configure it via the scraping apps by Grafana Agent itself (the "opposite" way).

rfratto commented 1 year ago

It's possible with Flow mode (here's a blog post explaining why we're building Flow mode), but not the older "static" agent mode (the YAML) config.

We believe Flow mode is the future of the agent, but we're still working on feature parity, which is why it's not the default recommendation for agent users yet. In the future, once we have feature parity with the older static agent mode, we'll have tools the be able to help people migrate to Flow mode from the static mode's YAML config.

Here's an example Flow mode pipeline which can do what you're looking for. It first accepts OTLP data over gRPC and HTTP, then forwards it out over OTLP again:

// Define an otelcol.receiver.otlp component labeled "default" which 
// will accept OTLP data over the network.
otelcol.receiver.otlp "default" {
  http {} // Accept OTLP over HTTP 
  grpc {} // Accept OTLP over gRPC  

  // Forward all signals to our otelcol.exporter.otlp.default component 
  // defined below. 
  output {
    metrics = [otelcol.exporter.otlp.default.input]
    logs    = [otelcol.exporter.otlp.default.input]
    traces  = [otelcol.exporter.otlp.default.input]
  }
}

// Define an otelcol.exporter.otlp component labeled "default" 
// which forwards received data to some endpoint.  
otelcol.exporter.otlp "default" {
    client {
        endpoint = ENDPOINT // Replace ENDPOINT with a string to send data to 
        tls {
            insecure             = true
            insecure_skip_verify = true
        }
    }
}

This second pipeline accepts metrics over OTLP gRPC, converts it to Prometheus metrics, and then forwards the resulting metrics over remote_write:

// Define an otelcol.receiver.otlp component labeled "default" which 
// will accept OTLP data over the network.
otelcol.receiver.otlp "default" {
  grpc {} // Accept OTLP over gRPC. 

  // This time, only forward metrics to our otelcol.exporter.prometheus
  // component defined below.  
  output {
    metrics = [otelcol.exporter.prometheus.default.input]
  }
}

// Define an otelcol.exporter.prometheus component labeled 
// default, which converts received OpenTelemetry data 
// into Prometheus metrics. 
otelcol.exporter.prometheus "default" {
  // Forward converted metrics to our remote_write component. 
  forward_to = [prometheus.remote_write.default.receiver]
}

// Define a prometheus.remote_write component labeled default, 
// which sends Prometheus-formatted metrics over the network 
// using the remote_write protocol.
prometheus.remote_write "default" {
  endpoint {
    url = REMOTE_WRITE_ENDPOINT
  }
}

As shown through these two examples, Flow mode is much more flexible of what is possible to do with its components, including moving between native OpenTelemetry components and native Prometheus components.

amseager commented 1 year ago

Thank you one more time, I'll ponder over it. I believe we can close the issue.

grafana / agent

A question about how to integrate Grafana Agent with any external metrics storage system #3015