Open diranged opened 6 months ago
Pinging code owners:
receiver/prometheus: @Aneurysm9 @dashpole
See Adding Labels via Comments if you do not have permissions to add labels yourself.
It isn't clear if this is an issue with the target allocator, or with the prometheus receiver. I would recommend opening re-opening the issue with the opentelemetry-operator first, and re-opening this issue once you have a minimal reproduction case with just the opentelemetry collector
I opened https://github.com/open-telemetry/opentelemetry-operator/issues/2922 - but I am hesitant to close this just yet as I am not sure this is an operator issue... the configs that are generated for the Prometheus Receiver look right to me... this looks to me like it's something going wrong in the prometheus receiver.
The data points imply that two different ServiceMonitors are scraping the same endpoint .. but when you look at the rendered scrape_configs
they clearly are pointing at different endpoints.. yet the DataPoints
list the same endpoint
value. That says to me that something is happening inside the Prometheus receiver to duplicate the data.
Can you reproduce the issue with just static_configs, and without the target allocator? Then we can help here
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
Component(s)
receiver/prometheus
What happened?
Description
We're setting up a single collector pod in a StatefulSet configured to monitor "all the rest" of our OTEL components ... this collector is called the
otel-collector-cluster-agent
, and it uses the TargetAllocator to monitor forServiceMonitor
jobs that have specific labels. We currently have two differentServiceMonitors
- one for collectingotelcol.*
metrics from the collectors, and one for collectingopentelemtry_.*
metrics from the Target Allocators.We are seeing metrics reported from the TargetAllocator pods duplicated into
DataPoints
that refer to the right, and wrong ServiceMonitor:When we delete the
otel-collector-collectors
ServiceMonitor, the behavior does not change... which is wild... however, if we delete the entire stack and namespace and then re-create it without the second ServiceMonitor, then the data is correct... until we create the second Service Monitor, then it goes bad again.Steps to Reproduce
This creates a
/scrape_configs
that looks like this:Expected Result
We should see datapoints for
opentelemetry_.*
metrics that only come from the target allocator pods and are attributed once .. meaning, oneDataPoint
per target pod, and that's it:Collector version
0.98.0
Environment information
Environment
OS: BottleRocket 1.19.2
OpenTelemetry Collector configuration
No response
Log output
No response
Additional context
No response