Closed andrewbulin closed 1 week ago
Hi!, I wonder if the "snappy" wal_compression is needed at all. In fact, running it locally, I get the error running your curl request
echo "some_metric 3.14" | curl --data-binary @- http://localhost:9090/api/v1/write
snappy: corrupt input
You need to configure Prometheus to accept remote-write with this flag:
-web.enable-remote-write-receiver
and
remote_write:
- url: "http://remote-storage-endpoint/api/v1/write"
You can take a look at the example here: https://github.com/grafana/tempo/blob/d36cc9f22714b46cd6c31123a7ee1f48b464cfb7/example/docker-compose/shared/tempo.yaml#L35
Thanks for replying! ^_^
Hi!, I wonder if the "snappy" wal_compression is needed at all. In fact, running it locally, I get the error running your curl request
Yeah, I agree this setting neither helps no hurts it seems either way. We can remove if it simplifies testing.
You need to configure Prometheus to accept remote-write with this flag:
I saw that, but I think the critical detail of a "pushgateway" is missing. For example, the defaults in example/docker-compose/distributed
also just work as you describe. And I think I have a better reproduction to match my usage here if you make some small changes to the example/docker-compose/distributed/
directory:
{cat << EOF
diff --git a/example/docker-compose/distributed/docker-compose.yaml b/example/docker-compose/distributed/docker-compose.yaml
index abf51e32a..c3d5bcc91 100644
--- a/example/docker-compose/distributed/docker-compose.yaml
+++ b/example/docker-compose/distributed/docker-compose.yaml
@@ -146,6 +146,11 @@ services:
ports:
- "9090:9090"
+ pushgateway:
+ image: prom/pushgateway:latest
+ ports:
+ - "9091:9091"
+
grafana:
image: grafana/grafana:11.0.0
volumes:
diff --git a/example/docker-compose/distributed/prometheus.yaml b/example/docker-compose/distributed/prometheus.yaml
index 439e48ce6..6c7e28f70 100644
--- a/example/docker-compose/distributed/prometheus.yaml
+++ b/example/docker-compose/distributed/prometheus.yaml
@@ -17,3 +17,7 @@ scrape_configs:
- 'querier:3200'
- 'query-frontend:3200'
- 'metrics-generator:3200'
+ - job_name: 'prometheus-pushgateway'
+ static_configs:
+ - targets: [ 'pushgateway:9091' ]
+
diff --git a/example/docker-compose/distributed/tempo-distributed.yaml b/example/docker-compose/distributed/tempo-distributed.yaml
index d9134ebdf..db4b58f42 100644
--- a/example/docker-compose/distributed/tempo-distributed.yaml
+++ b/example/docker-compose/distributed/tempo-distributed.yaml
@@ -43,7 +43,8 @@ metrics_generator:
storage:
path: /var/tempo/generator/wal
remote_write:
- - url: http://prometheus:9090/api/v1/write
+ # - url: http://prometheus:9090/api/v1/write
+ - url: http://pushgateway:9091/metrics/job/tempo/instance/metrics-generator
send_exemplars: true
storage:
@@ -63,4 +64,4 @@ storage:
overrides:
defaults:
metrics_generator:
- processors: ['service-graphs', 'span-metrics']
\ No newline at end of file
+ processors: ['service-graphs', 'span-metrics']
EOF
} | git apply
With these changes, the metrics-generator
logs should show the same snappy corrupt error. There may be a right way to configure the generator module to work correctly with prometheus' pushgateway, but I've yet to find it. 🤔
Maybe it could be something with push metric inconsistencies which can cause 400 errors, reference: https://github.com/prometheus/pushgateway/blob/master/README.md#about-metric-inconsistencies
I'm now also concerned that metrics pushed to a pushgateway are never forgotten until removed: https://prometheus.io/docs/practices/pushing/#should-i-be-using-the-pushgateway
I guess my questions now are:
In the meantime, ~if~ I can confirm that Tempo metrics generator pushing to directly to the Prometheus server with receiver enabled does work and may be a sufficient workaround for my usecase.
This issue has been automatically marked as stale because it has not had any activity in the past 60 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity. Please apply keepalive label to exempt this Issue.
Describe the bug
Tempo metrics-generator always failed to push to Prometheus push gateway with a snappy error:
ts=2024-07-30T08:40:27.386622385Z caller=dedupe.go:112 tenant=single-tenant component=remote level=error remote_name=f56174 url=http://prometheus-for-amp-prometheus-pushgateway.prometheus.svc.cluster.local:9091/metrics/job/tempo-metrics-generator msg="non-recoverable error" count=1099 exemplarCount=0 err="server returned HTTP status 400 Bad Request: snappy: corrupt input"
Any tips or recommendations to help debug this would be appreciated.
To Reproduce
grafana/tempo-distributed
, version 2.5.0prometheus-community/prometheus
, version v2.53.1example/docker-compose/distributed/
with small tweaks, see this comment below for details and a diff.Helm tempo values for
metrics-generator
:Expected behavior
Metrics should just push.
I can confirm from inside the cluster that push completes from the
tempo
namespace, via cURL:echo "some_metric 3.14" | curl --data-binary @- http://prometheus-for-amp-prometheus-pushgateway.prometheus.svc.cluster.local:9091/metrics/job/tempo-metrics-generator
Environment: