open-telemetry / opentelemetry-python-contrib

OpenTelemetry instrumentation for Python modules
https://opentelemetry.io
Apache License 2.0
722 stars 595 forks source link

Trouble writing metrics on thanos receiver #2920

Open giuseka opened 9 hours ago

giuseka commented 9 hours ago

Describe your environment

OS: Ubuntu: 22.04 Python version: Python 3.12.3 Package version:

What happened?

I write a simple code to create a gauge metric on thanos receiver, but the operation return the error:

Export POST request failed with reason: 409 Client Error: Conflict for url:

In the log of the receiver I observed thi error:

msg="failed to handle request" err="add 1 series: out of order labels"

Steps to Reproduce

Execute this code with a working thanos receiver https uri:

from opentelemetry.exporter.prometheus_remote_write import PrometheusRemoteWriteMetricsExporter
from opentelemetry.metrics import Observation, set_meter_provider, get_meter
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader

name = 'test_metric'
desc = 'test_metric'
unit = 'count'
value = 1000
attributes = {'job': 'testJob'}

exporter = PrometheusRemoteWriteMetricsExporter(
    endpoint=<thanos receiver https uri>,
    tls_config={'insecure_skip_verify': False},
    resources_as_labels=True
)
reader = PeriodicExportingMetricReader(exporter, 1000)
provider = MeterProvider(metric_readers=[reader])
set_meter_provider(provider)
meter = get_meter(__name__)
gauge = meter.create_gauge(name, description=desc, unit=unit)
gauge.set(value, attributes)

Expected Result

Metric is created in thanos.

Actual Result

Metric is rejected from thanos.

Additional context

I open the code of the class PrometheusRemoteWriteMetricsExporter and make a little modification in method _convert_to_timeseries ordering the label before create the timeseries.

This is my working version of the method:

def _convert_to_timeseries(
    self, sample_sets: Sequence[tuple], resource_labels: Sequence
) -> Sequence[TimeSeries]:
    timeseries = []
    for labels, samples in sample_sets.items():
        ts = TimeSeries()
        labels = list(labels) + list(resource_labels)
        labels.sort(key=lambda x: x[0])
        for label_name, label_value in chain(labels):
            ts.labels.append(self._label(label_name, str(label_value)))
        for value, timestamp in samples:
            ts.samples.append(self._sample(value, timestamp))
        timeseries.append(ts)
    return timeseries

Would you like to implement a fix?

Yes

xrmx commented 9 hours ago

Looks like it's fixed by https://github.com/open-telemetry/opentelemetry-python-contrib/pull/2784 ? If you can pick up that code, add a test and a changelog entry we can review and merge it

giuseka commented 6 hours ago

I tested locally the solution proposed in https://github.com/open-telemetry/opentelemetry-python-contrib/pull/2784, and it works for me. The code is more simple than proposed by me.