Closed domasx2 closed 1 year ago
Is it correct to say that we don't reproduce this problem on the "Otel Demo" environment nor in our other testing environments? Could there be something special about this environment? Could the Grafana Ops environment be different from "standard environments"?
@cyrille-leclerc I think you are correct. Ops prometheus datasource has scrape interval set to 15s
, while "standard" cloud instance have 60s
. That should explain it
What I said above is wrong, it's not due to scrape interval. dev env data point updates & scrapes are aligned. Different Tempo configuration?
After investigation, we got to the conclusion that the principal cause of the sawtooth like metrics when applying rate()
is the batching and buffering since a span is created until it reaches the metrics-generator. This causes that spans will arrive in small bursts to the processors, more that they're naturally being generated.
Describe the bug rate over spanmetrics are suspiciously jagged, dropping to 0 every minute
explore in grafana ops
To Reproduce Steps to reproduce the behavior:
Goutham: So basically the datapoints are sent every 15s, but they are updated every 1m. The problem is that while samples are being sent every 15s, the update of the metric happens only every minute.
For the "instant" query:
traces_spanmetrics_latency_count{__metrics_gen_instance="metrics-generator-5684fd747f-7zgwt",cluster=.........}[10m]
, you get:Here you can see that the metrics are updated only 1m, but sent every 15s.