open-telemetry / opentelemetry-cpp

The OpenTelemetry C++ Client
https://opentelemetry.io/
Apache License 2.0
835 stars 396 forks source link

Cardinality limit on metrics (otel_metrics_overflow) #2997

Closed lalitb closed 1 month ago

lalitb commented 1 month ago

Discussed in https://github.com/open-telemetry/opentelemetry-cpp/discussions/2993

Originally posted by **xjanin** July 8, 2024 Hi, I've instrumented my application with opentelemetry-cpp (metrics only), and I use tags in my metrics. When testing with a low rate of metrics recording, I see my metrics with the expected tags in the prometheus export on the opentelemetry collector. However, when testing the application with a high rate of requests, and so a high rate of metrics recording, I get metrics aggregated in time series with no tags except "otel_metrics_overflow". My understanding is that this should happens if the sdk collector has to collect metrics with a tag cardinality superior to 2000 in a collection cycle (by default). Edit: https://opentelemetry.io/docs/specs/otel/metrics/sdk/#cardinality-limits However, my tag cardinality doesn't increase with the number of request processed, and in the opentelemetry collector that my application uses, I don't see 2000 time series in the prometheus export. So my questions are : - Do I have a wrong understanding of what "otel_metrics_overflow" means ? - Is there a way of monitoring the cardinality of the metrics produced by my application ? Thank you and best regards, Xavier
lalitb commented 1 month ago

As per the investigation by @xjanin - https://github.com/open-telemetry/opentelemetry-cpp/discussions/2993#discussioncomment-9998475

ThomsonTan commented 1 month ago

Thanks for the investigation, @xjanin. Could you please also share the related sample code which could reproduce the issue? Like some filter is added to the metric view?

xjanin commented 1 month ago

Hello @ThomsonTan,

I didn't use any view.

It was more difficult than I though to create a small example but here goes (be careful, this code leaks memory):

#include <vector>

#include <opentelemetry/metrics/sync_instruments.h>
#include <opentelemetry/metrics/meter.h>
#include <opentelemetry/metrics/provider.h>
#include <opentelemetry/metrics/meter_provider.h>

#include <opentelemetry/exporters/ostream/metric_exporter_factory.h>
#include "opentelemetry/sdk/metrics/meter_provider.h"
#include "opentelemetry/sdk/metrics/meter_provider_factory.h"
#include "opentelemetry/metrics/provider.h"
#include "opentelemetry/sdk/metrics/export/periodic_exporting_metric_reader_options.h"
#include "opentelemetry/sdk/metrics/export/periodic_exporting_metric_reader_factory.h"
#include "opentelemetry/exporters/ostream/metric_exporter.h"
#include "opentelemetry/exporters/otlp/otlp_http_metric_exporter_factory.h"
#include "opentelemetry/exporters/otlp/otlp_http_metric_exporter_options.h" 

namespace metrics_sdk = opentelemetry::sdk::metrics;
namespace metrics_api = opentelemetry::metrics;
namespace metrics_exporter = opentelemetry::exporter::metrics;

using namespace std::literals::chrono_literals;

int main() {
    auto ostream_exporter = std::make_unique<metrics_exporter::OStreamMetricExporter>();
    auto const exporter_options = metrics_sdk::PeriodicExportingMetricReaderOptions{2s, 1s};
    auto metric_reader = metrics_sdk::PeriodicExportingMetricReaderFactory::Create(std::move(ostream_exporter), exporter_options);
    auto meter_provider = metrics_sdk::MeterProviderFactory::Create();
    auto* p = static_cast<metrics_sdk::MeterProvider*>(meter_provider.get());
    p->AddMetricReader(std::move(metric_reader));

    std::shared_ptr<metrics_api::MeterProvider> shared_provider = std::move(meter_provider);
    metrics_api::Provider::SetMeterProvider(shared_provider);

    auto meter = shared_provider->GetMeter("test_meter", "1");
    auto histogram = meter->CreateUInt64Histogram("test_histogram", "test_histogram", "s");
    auto const context = opentelemetry::context::Context{};
    for(size_t i = 0; i < 3000; ++i) {
        auto const telemetry_tags = new std::map<std::string, std::string>{
            {"key", "value"}}; // memory leak
        auto const telemetry_tags_view = opentelemetry::common::KeyValueIterableView<std::map<std::string, std::string>>{*telemetry_tags};
        histogram->Record(10, telemetry_tags_view, context);
    }
    std::this_thread::sleep_for(2s);
}

This is not how I create my tags in my code but it achieves the same effect.

This the result :

{
  scope name    : test_meter
  schema url    : 
  version       : 1
  start time    : Wed Jul 10 12:15:11 2024
  end time      : Wed Jul 10 12:15:13 2024
  instrument name       : test_histogram
  description   : test_histogram
  unit          : s
  type     : HistogramPointData
  count     : 1999
  sum     : 19990
  min     : 10
  max     : 10
  buckets     : [0, 5, 10, 25, 50, 75, 100, 250, 500, 750, 1000, 2500, 5000, 7500, 10000, ]
  counts     : [0, 0, 1999, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ]
  attributes            : 
        key: value
  type     : HistogramPointData
  count     : 1001
  sum     : 10010
  min     : 10
  max     : 10
  buckets     : [0, 5, 10, 25, 50, 75, 100, 250, 500, 750, 1000, 2500, 5000, 7500, 10000, ]
  counts     : [0, 0, 1001, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ]
  attributes            : 
        otel.metrics.overflow: 1
  resources     :
        service.name: unknown_service
        telemetry.sdk.language: cpp
        telemetry.sdk.name: opentelemetry
        telemetry.sdk.version: 1.15.0
}