Open VijayPatil872 opened 4 months ago
Pinging code owners:
connector/servicegraph: @jpkrohling @mapno @JaredTan95
See Adding Labels via Comments if you do not have permissions to add labels yourself.
Hi @VijayPatil872. Yes, this is a very unfortunate issue of horizontally scaling the connector. A workaround is adding a label to the metrics that corresponds to the collector pod name or something unique that makes the series unique across instances.
Hi @mapno will you please suggest/elaborate more on how to add label to the metrics that corresponds to the collector pod name or something unique that makes the series unique across instances.
Hi @VijayPatil872. I believe something like the k8sattributesprocessor should work for that. With it, you can add a label like k8s.pod.name
to your metrics and make the series unique between instances.
Hi @mapno some workaround done with k8sattributesprocessor
, it is seen that label mentioned in configuration are seen in otel logs, but issue still persists. it is not worked for me.
Do the metrics now have k8s.pod.name
as label and you still get the same errors?
Hi @mapno the k8sattributesprocessor
with following configuration added,
k8sattributes:
auth_type: "serviceAccount"
passthrough: false
extract:
metadata:
- k8s.namespace.name
- k8s.deployment.name
- k8s.statefulset.name
- k8s.daemonset.name
- k8s.cronjob.name
- k8s.job.name
- k8s.node.name
- k8s.pod.name
- k8s.pod.uid
- k8s.pod.start_time
pod_association:
- sources:
- from: resource_attribute
name: k8s.namespace.name
- from: resource_attribute
name: k8s.pod.name
It is seen in open telemetry collector logs the labels are getting added whichever available still issue persists.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
not stale.
Component(s)
connector/servicegraph
What happened?
Description
Currently we are facing an issue on open telemetry-collector for "service graph connector" that few samples has been rejected because another sample with the same timestamp, but a different value, has already been ingested. (err-mimir-sample-duplicate-timestamp) when the metrics are ingested to mimir.
We are using
servicegraphs
connector to build service graph. We have deployed a layer of Collectors containing the load-balancing exporter in front of traces Collectors doing the span metrics and service graph connector processing. The load-balancing exporter is used to hash the trace ID consistently and determine which collector backend should receive spans for that trace. the servicegraph exporting the metrics to Grafana mimir withprometheusremotewrite
exporter. The mimir distributer failed to inject some of the metrics and gives the following error,ts=2024-07-19T07:26:46.442694833Z caller=push.go:171 level=error user=default-processor-servicegraph msg="push error" err="failed pushing to ingester mimir-ingester-zone-a-2: user=default-processor-servicegraph: the sample has been rejected because another sample with the same timestamp, but a different value, has already been ingested (err-mimir-sample-duplicate-timestamp). The affected sample has timestamp 2024-07-19T07:26:46.23Z and is from series traces_service_graph_request_client_seconds_bucket{client=\"claims-service\", connection_type=\"virtual_node\", failed=\"false\", le=\"0.1\", server=\"xxxxx.redis.cache.windows.net\"}"
could someone please help on eliminating this error
Steps to Reproduce
Expected Result
The metrics failure should be zero.
Actual Result
We see metrics failed because of above mentioned error on open telemetry dashboard as given below.
Collector version
0.104.0
Environment information
No response
OpenTelemetry Collector configuration
Log output
No response
Additional context
No response