grafana / alloy

OpenTelemetry Collector distribution with programmable pipelines
https://grafana.com/oss/alloy
Apache License 2.0
1.33k stars 182 forks source link

Component prometheus.remote_write apparently use part of resolved endpoint>tls_config>server_name to contact Prometheus API instead of endpoint>url #220

Open outofsight opened 6 months ago

outofsight commented 6 months ago

What's wrong?

The component prometheus.remote_write, with an endpoint configuration as the following, apparently try to make a post request to https://portal.domain.tld instead of https://prometheus-proxy: the request fail because DNS name portal.domain.tld resolve to another non-routable ip address of the proxy.

Notes:

  1. server_name is mandatory because the proxy host also other sites, but should be used for TLS negotiation, not endpoint api discovery.
  2. Despite server_name being prometheus.portal.domain.tld, Agent apparnetly resolve and try to dial for portal.domain.tld.
  3. Despite portal.domain.tld being the same proxy, dial to this IP address fail because this address is on a Docker macvlan network.

Steps to reproduce

  1. Setup Prometheus remote write API and a load balancer / authenticating proxy in front of it as docker containers.
  2. Setup grafana agent and scraping targets as docker containers.
  3. Connect all previous containers to a dedicated docker overlay network.
  4. Give the proxy container an alias prometheus-proxy on that docker network.
  5. (Optional?) Connect the proxy container to another docker macvlan network, assign to it DNS name "prometheus.portal.domain.tld" and "portal.domain.tld" and give it a TLS trusted certificate for these DNS names.
  6. Setup grafana agent to scrape targets and send metrics to Prometheus Proxy as in the previous block
  7. Notice that Grafana Agent try to dial the proxy macvlan IP address instead of the overlay network IP address and fail to send metrics.

System information

Linux 4.4.302+ x86_64 GNU/Linux + Docker

Software version

Grafana Agent v0.40

Configuration

endpoint {
    url = "https://prometheus-proxy/api/v1/write"

    tls_config {
        server_name = "prometheus.portal.domain.tld"
        insecure_skip_verify = true
    }

    basic_auth {
        username = "grafana-agent"
        password_file = "/run/secrets/grafana-agent-password"
    }
}

Logs

ts=2024-03-12T10:01:02.292439262Z level=warn msg="Failed to send batch, retrying" component=prometheus.remote_write.default subcomponent=rw remote_name=9c0901 url=https://prometheus-proxy/api/v1/write err="Post \"https://portal.domain.tld\": dial tcp 10.x.y.z:443: connect: no route to host"
rfratto commented 5 months ago

Hi there :wave:

On April 9, 2024, Grafana Labs announced Grafana Alloy, the spirital successor to Grafana Agent and the final form of Grafana Agent flow mode. As a result, Grafana Agent has been deprecated and will only be receiving bug and security fixes until its end-of-life around November 1, 2025.

To make things easier for maintainers, we're in the process of migrating all issues tagged variant/flow to the Grafana Alloy repository to have a single home for tracking issues. This issue is likely something we'll want to address in both Grafana Alloy and Grafana Agent, so just because it's being moved doesn't mean we won't address the issue in Grafana Agent :)

github-actions[bot] commented 4 months ago

This issue has not had any activity in the past 30 days, so the needs-attention label has been added to it. If the opened issue is a bug, check to see if a newer release fixed your issue. If it is no longer relevant, please feel free to close this issue. The needs-attention label signals to maintainers that something has fallen through the cracks. No action is needed by you; your issue will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity. Thank you for your contributions!