grafana / alloy

OpenTelemetry Collector distribution with programmable pipelines
https://grafana.com/oss/alloy
Apache License 2.0
1.41k stars 205 forks source link

remotecfg components don't work after pull request #1501 #1688

Open thiennn-neji opened 1 month ago

thiennn-neji commented 1 month ago

What's wrong?

The Grafana Alloy remotecfg feature was fixed in pull request #1372, and I tested it with Grafana Alloy Revision 800739fab, where it worked smoothly. However, starting from pull request #1501, it seems to no longer work.

It appears that the componentID for remotecfg components is in the format remotecfg/example.component.label rather than just example.component.label. However, in line 183 of pull request #1501 (see here), it assumes that the componentID in format example.component.label

In my opinion, the s.componentHandler should reference s.componentHttpPathPrefixRemotecfg instead of s.componentHttpPathPrefix

Steps to reproduce

Step 0: An alloy-remote-config-server is serving gRPC at 127.0.0.1:8888 with the following template (also specfied in Configuration section):

prometheus.exporter.self "default" { }

Step 1: Use two Docker images for two Grafana Alloy revisions: (Revision 9e290c693, v1.4.0-rc.0) and (Revision 800739fab). Run the same configuration:

// test.alloy
logging {
    level  = "debug"
    format = "logfmt"
}

remotecfg {
    url            = "http://127.0.0.1:8888"
    id             = constants.hostname
    poll_frequency = "10s"
    attributes     = {
        "template_name"  = "test-remotecfg",
    }
}

with the command line options:

alloy run --stability.level=experimental --storage.path=/tmp/alloy /test.alloy

Step 2: In each Grafana Alloy instance, use cURL to fetch metrics

curl http://localhost:12345/api/v0/component/remotecfg/prometheus.exporter.self.default/metrics

Step 3: Check the output of cURL command and Grafana Alloy logs

Alloy log

ts=2024-09-16T04:35:43.717199572Z level=info "boringcrypto enabled"=false
ts=2024-09-16T04:35:43.716113131Z level=info source=/go/pkg/mod/github.com/!kim!machine!gun/automemlimit@v0.6.0/memlimit/memlimit.go:170 msg="memory is not limited, skipping: %v" package=github.com/KimMachineGun/automemlimit/memlimit !BADKEY="memory is not limited"
ts=2024-09-16T04:35:43.717236944Z level=info msg="no peer discovery configured: both join and discover peers are empty" service=cluster
ts=2024-09-16T04:35:43.717240511Z level=info msg="starting complete graph evaluation" controller_path=/ controller_id="" trace_id=314425bf3e5d0b1540153436fba928ab
ts=2024-09-16T04:35:43.717245918Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=314425bf3e5d0b1540153436fba928ab node_id=remotecfg duration=638.347µs
ts=2024-09-16T04:35:43.717250881Z level=info msg="applying non-TLS config to HTTP server" service=http
ts=2024-09-16T04:35:43.717254029Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=314425bf3e5d0b1540153436fba928ab node_id=http duration=6.381µs
ts=2024-09-16T04:35:43.717258166Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=314425bf3e5d0b1540153436fba928ab node_id=cluster duration=345ns
ts=2024-09-16T04:35:43.717262171Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=314425bf3e5d0b1540153436fba928ab node_id=otel duration=288ns
ts=2024-09-16T04:35:43.717265709Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=314425bf3e5d0b1540153436fba928ab node_id=labelstore duration=1.76µs
ts=2024-09-16T04:35:43.717270911Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=314425bf3e5d0b1540153436fba928ab node_id=tracing duration=2.865µs
ts=2024-09-16T04:35:43.717275657Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=314425bf3e5d0b1540153436fba928ab node_id=logging duration=89.255µs
ts=2024-09-16T04:35:43.717288179Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=314425bf3e5d0b1540153436fba928ab node_id=livedebugging duration=3.757µs
ts=2024-09-16T04:35:43.717295397Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=314425bf3e5d0b1540153436fba928ab node_id=ui duration=382ns
ts=2024-09-16T04:35:43.717300411Z level=info msg="finished complete graph evaluation" controller_path=/ controller_id="" trace_id=314425bf3e5d0b1540153436fba928ab duration=821.508µs
ts=2024-09-16T04:35:43.717310333Z level=debug msg="changing node state" service=cluster from=viewer to=participant
ts=2024-09-16T04:35:43.717317467Z level=debug msg="REDACTED @1: participant" service=cluster
ts=2024-09-16T04:35:43.717382118Z level=info msg="scheduling loaded components and services"
ts=2024-09-16T04:35:43.71749769Z level=info msg="starting cluster node" service=cluster peers_count=0 peers="" advertise_addr=127.0.0.1:12345
ts=2024-09-16T04:35:43.717529278Z level=debug msg="REDACTED @3: participant" service=cluster
ts=2024-09-16T04:35:43.717764506Z level=info msg="peers changed" service=cluster peers_count=1 peers=REDACTED
ts=2024-09-16T04:35:43.717874201Z level=info msg="now listening for http traffic" service=http addr=127.0.0.1:12345
ts=2024-09-16T04:35:43.718252927Z level=info msg="starting complete graph evaluation" controller_path=/ controller_id=remotecfg trace_id=23892e91445f51aa15ae2d765e367e18
ts=2024-09-16T04:35:43.718297571Z level=info msg="finished node evaluation" controller_path=/ controller_id=remotecfg trace_id=23892e91445f51aa15ae2d765e367e18 node_id=prometheus.exporter.self.default duration=15.482µs
ts=2024-09-16T04:35:43.718309658Z level=info msg="finished complete graph evaluation" controller_path=/ controller_id=remotecfg trace_id=23892e91445f51aa15ae2d765e367e18 duration=80.709µs
ts=2024-09-16T04:35:43.718652439Z level=info msg="scheduling loaded components and services"
ts=2024-09-16T04:35:53.770017569Z level=debug msg="skipping over API response since it contained the same hash" service=remotecfg

cURL log

curl http://localhost:12345/api/v0/component/remotecfg/prometheus.exporter.self.default/metrics

# Output
# HELP alloy_build_info A metric with a constant '1' value labeled by version, revision, branch, goversion from which alloy was built, and the goos and goarch for the build.
# TYPE alloy_build_info gauge
alloy_build_info{branch="main",goarch="amd64",goos="linux",goversion="go1.22.5",revision="800739fab",tags="netgo,builtinassets,promtail_journal_enabled",version="v1.4.0-devel+800739fab"} 1

Alloy log

ts=2024-09-16T04:39:52.849428699Z level=info "boringcrypto enabled"=false
ts=2024-09-16T04:39:52.83708336Z level=info source=/go/pkg/mod/github.com/!kim!machine!gun/automemlimit@v0.6.0/memlimit/memlimit.go:170 msg="memory is not limited, skipping: %v" package=github.com/KimMachineGun/automemlimit/memlimit !BADKEY="memory is not limited"
ts=2024-09-16T04:39:52.851069694Z level=info msg="no peer discovery configured: both join and discover peers are empty" service=cluster
ts=2024-09-16T04:39:52.851097278Z level=info msg="starting complete graph evaluation" controller_path=/ controller_id="" trace_id=63e74be47a1d3930e0fb462f78acb05f
ts=2024-09-16T04:39:52.851130211Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=63e74be47a1d3930e0fb462f78acb05f node_id=tracing duration=6.094µs
ts=2024-09-16T04:39:52.851156684Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=63e74be47a1d3930e0fb462f78acb05f node_id=remotecfg duration=10.43222ms
ts=2024-09-16T04:39:52.851178036Z level=info msg="applying non-TLS config to HTTP server" service=http
ts=2024-09-16T04:39:52.851190826Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=63e74be47a1d3930e0fb462f78acb05f node_id=http duration=38.89µs
ts=2024-09-16T04:39:52.851236824Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=63e74be47a1d3930e0fb462f78acb05f node_id=cluster duration=1.883µs
ts=2024-09-16T04:39:52.851260117Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=63e74be47a1d3930e0fb462f78acb05f node_id=otel duration=1.433µs
ts=2024-09-16T04:39:52.851282544Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=63e74be47a1d3930e0fb462f78acb05f node_id=livedebugging duration=13.309µs
ts=2024-09-16T04:39:52.851302812Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=63e74be47a1d3930e0fb462f78acb05f node_id=ui duration=2.026µs
ts=2024-09-16T04:39:52.851332402Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=63e74be47a1d3930e0fb462f78acb05f node_id=logging duration=1.947391ms
ts=2024-09-16T04:39:52.851397706Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=63e74be47a1d3930e0fb462f78acb05f node_id=labelstore duration=21.187µs
ts=2024-09-16T04:39:52.851431635Z level=info msg="finished complete graph evaluation" controller_path=/ controller_id="" trace_id=63e74be47a1d3930e0fb462f78acb05f duration=12.710918ms
ts=2024-09-16T04:39:52.851487935Z level=debug msg="changing node state" service=cluster from=viewer to=participant
ts=2024-09-16T04:39:52.851526946Z level=debug msg="cpu12681 @1: participant" service=cluster
ts=2024-09-16T04:39:52.851812792Z level=info msg="scheduling loaded components and services"
ts=2024-09-16T04:39:52.853168428Z level=info msg="starting cluster node" service=cluster peers_count=0 peers="" advertise_addr=127.0.0.1:12345
ts=2024-09-16T04:39:52.853311651Z level=debug msg="cpu12681 @3: participant" service=cluster
ts=2024-09-16T04:39:52.853737901Z level=info msg="peers changed" service=cluster peers_count=1 peers=cpu12681
ts=2024-09-16T04:39:52.855116828Z level=info msg="now listening for http traffic" service=http addr=127.0.0.1:12345
ts=2024-09-16T04:39:52.858859086Z level=info msg="starting complete graph evaluation" controller_path=/ controller_id=remotecfg trace_id=167b9941097215f5ddb8421535365edc
ts=2024-09-16T04:39:52.859086486Z level=info msg="finished node evaluation" controller_path=/ controller_id=remotecfg trace_id=167b9941097215f5ddb8421535365edc node_id=prometheus.exporter.self.default duration=82.917µs
ts=2024-09-16T04:39:52.859136085Z level=info msg="finished complete graph evaluation" controller_path=/ controller_id=remotecfg trace_id=167b9941097215f5ddb8421535365edc duration=361.008µs
ts=2024-09-16T04:39:52.860527774Z level=info msg="scheduling loaded components and services"
2024/09/16 04:39:58 http: superfluous response.WriteHeader call from go.opentelemetry.io/contrib/instrumentation/github.com/gorilla/mux/otelmux.getRRW.func2.1 (mux.go:114)

cURL log

curl http://localhost:12345/api/v0/component/remotecfg/prometheus.exporter.self.default/metrics

# Output
failed to parse URL path "/api/v0/component/remotecfg/prometheus.exporter.self.default/metrics": invalid path

System information

Linux 6.10.6 x86_64 (Ubuntu 24.04.1 LTS)

Software version

Grafana Alloy (Revision 9e290c693, v1.4.0-rc.0) and Grafana Alloy (Revision 800739fab)

Configuration

// Alloy remote config server template
prometheus.exporter.self "default" { }

// Grafana Alloy config (Both revision 9e290c693 and 800739fab)
remotecfg {
    url            = "http://127.0.0.1:8888"
    id             = constants.hostname
    poll_frequency = "10s"
    attributes     = {
        "template_name"  = "test-remotecfg",
    }
}

Logs

No response

wildum commented 1 month ago

@tpaschalis

tpaschalis commented 1 month ago

Hey there @thiennn-neji 👋

Just for context:

I'd like to understand what you're trying to achieve and what was the expected result.

So if I understand correctly the following file is what is passed to alloy run as the /test.alloy

// Grafana Alloy config (Both revision 9e290c693 and 800739fab)
remotecfg {
    url            = "http://127.0.0.1:8888"
    id             = constants.hostname
    poll_frequency = "10s"
    attributes     = {
        "template_name"  = "test-remotecfg",
    }
}

And the following is what the remotecfg server returns

prometheus.exporter.self "default" { }

Is that correct? Are there any other pieces of configuration to either side (either the 'local' or the 'remote' configuration?).

Could you also please provide

tpaschalis commented 1 month ago

Ok, I was able to get a little closer to the root cause.

I have my test remotecfg server return the following configuration, with a root exporter and one wrapped inside of a module.

prometheus.exporter.self "default" { }

prometheus.scrape "default" {
        targets    = prometheus.exporter.self.default.targets
        forward_to = []
}

declare "mymodule" {
        prometheus.exporter.self "inner" { }

        prometheus.scrape "inner" {
                targets    = prometheus.exporter.self.inner.targets
                forward_to = []
        }
}

mymodule "default" { }

The curl command fails on the first one, but works on the second one

$ curl http://localhost:12345/api/v0/component/remotecfg/prometheus.exporter.self.default/metrics

failed to parse URL path "/api/v0/component/remotecfg/prometheus.exporter.self.default/metrics": invalid path

$ curl http://localhost:12345/api/v0/component/remotecfg/mymodule.default/prometheus.exporter.self.inner/metrics
# HELP alloy_build_info A metric with a constant '1' value labeled by version, revision, branch, goversion from which alloy was built, and the goos and goarch for the build.
# TYPE alloy_build_info gauge
alloy_build_info{branch="main",goarch="amd64",goos="linux",goversion="go1.22.7",revision="9e1b6e827",tags="unknown",version="v1.5.0-devel+wip"} 1

I hadn't come across this as I always wrap my configuration in a module.

Did you have any specific issues to point towards that components aren't working at all, or just that they're not reachable via api/v0 here?

As far as I can tell, the scrape components are scheduled and try to scrape, but the first one fails because of the error you pointed out.

tmarshall98 commented 1 month ago

I can confirm this just got released in 1.4.0

prometheus.exporter.unix "default" {}

prometheus.scrape "unix_exporter" {
  targets    = prometheus.exporter.unix.default.targets
  forward_to = [prometheus.remote_write.mimir.receiver]
}

prometheus.remote_write "mimir" {
  endpoint {
      url = "https://<mimir_url>/api/v1/push"
      headers = {
        "X-Scope-OrgID" = "tenant",
      }
  }
}

Wrapping this in a module makes it work

Did you have any specific issues to point towards that components aren't working at all, or just that they're not reachable via api/v0 here?

I believe its the latter, I see logs of the components starting, and they are visible in the UI - just the failed to parse URL path at scrape time seems to be the problem

github-actions[bot] commented 1 week ago

This issue has not had any activity in the past 30 days, so the needs-attention label has been added to it. If the opened issue is a bug, check to see if a newer release fixed your issue. If it is no longer relevant, please feel free to close this issue. The needs-attention label signals to maintainers that something has fallen through the cracks. No action is needed by you; your issue will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity. Thank you for your contributions!