open-telemetry / opentelemetry-collector

OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
4.25k stars 1.4k forks source link

unregister receiver metrics when receiver is stopped/removed #10223

Open newly12 opened 3 months ago

newly12 commented 3 months ago

Is your feature request related to a problem? Please describe.

we have an in-house receiver that dynamically reloads the configs to start/stop pipelines on the fly without restarting the otel collector, we noticed when a pipeline(receivers and processors) is stopped, the receiver metrics still remain from the metrics page, such as otelcol_receiver_accepted_metric_points, otelcol_receiver_refused_metric_points for a metrics receiver, which leads to the metrics endpoint size keeps increasing as well as the number of total metrics.

Describe the solution you'd like

when receiver is stopped, its related metrics should be removed as well.

Describe alternatives you've considered

Additional context

vjsamuel commented 3 months ago

This got exasperated because of https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/prometheusreceiver/metrics_receiver.go#L263 which expects the receiver names to be unique, this causes the endpoint to infinitely grow. The version of prometheus that expects this behavior is 2.50+

newly12 commented 1 month ago

Hi team, could we have some update on this one? It is been a blocking issue for us to upgrade our metrics otel collectors version, also the other logs otel collector faced the same issue, given pods on the same k8s node are changing from time to time, receivers/pipelines have to be brought up and shutdown all the time and this leaves the metrics endpoint to growing pretty fast, and waste of memory of keeping "stale" metrics..

newly12 commented 1 month ago

I found https://github.com/open-telemetry/opentelemetry-specification/issues/3062, it appears otel metrics sdk does not support removal of certain metrics at this moment.

If it is not possible to support metrics auto removal of stopped components shortly, does it make sense to disable metrics of certain component kinds(receiver/processor/exporter) through a new feature gate?