open-telemetry / opentelemetry-dotnet

The OpenTelemetry .NET Client
https://opentelemetry.io
Apache License 2.0
3.21k stars 759 forks source link

[bug] - Memory leak #5922

Open LGouellec opened 2 days ago

LGouellec commented 2 days ago

Package

OpenTelemetry

Package Version

Package Name Version
OpenTelemetry.Api 1.9.0
OpenTelemetry 1.9.0
OpenTelemetry.Exporter.Prometheus.HttpListener 1.9.0-beta.2
OpenTelemetry.Extensions.Hosting 1.9.0
OpenTelemetry.Instrumentation.Runtime 1.9.0

Runtime Version

net8.0

Description

I need to Dispose(..) often my Meter instance, because today Meter doesn't provide a way to delete an existing metric dynamically. See : https://github.com/dotnet/runtime/issues/83822.

When I hit meter.Dispose(), the dotnet runtime will notify each instrument perviously created that they are not longer published. See: https://github.com/dotnet/runtime/blob/9e59acb298c20658788567e0c6f0793fe97d37f6/src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/Meter.cs#L519

The OpenTelemetry MetricReader (in my case BaseExportingMetricReader) must deactivateMetric : See: https://github.com/open-telemetry/opentelemetry-dotnet/blob/5dff99f8a0b26ff75248d5f2e478b9c3c42f5678/src/OpenTelemetry/Metrics/Reader/MetricReaderExt.cs#L30

But it seems there is a memory leak, because when I run my program for a long time, the memory still growing. When I debug my app, I see a lot of MetricPoint[] instances unreleased. More time my application run, more memory consumption is required only for MetricPoint[] instances. It seems when meter.Dispose() is called, the open telemetry metrics are not released properly.

Screen of my debugger at 10 min uptimes :

image

Screen of my debugger at 60 min uptimes (122 Mb of MetricsPoint[] )

image

Steps to Reproduce

    public static void Main(string[] args)
    {
        Meter meter = null;
        var meterProviderBuilder = Sdk
            .CreateMeterProviderBuilder()
            .AddMeter("Test")
            .SetResourceBuilder(
                ResourceBuilder.CreateDefault()
                    .AddService(serviceName: "Test"));

        meterProviderBuilder.AddPrometheusHttpListener();
        meterProviderBuilder.AddRuntimeInstrumentation();

        var tracerProvider = meterProviderBuilder.Build();

        while(true){
            meter?.Dispose();
            meter = new Meter("Test");
            var rd = new Random();

            meter.CreateObservableGauge(
                "requests", 
                () => new[] {
                    new Measurement<double>(rd.NextDouble() * rd.Next(100))
                },
                description: "Request per second");

             // will see after couple of minutes that the MetricReader contains a lot of MetricPoint[], even if we dispose the Meter after each iteration
            Thread.Sleep(200);
        }
    }

Expected Result

All perviously MetricPoint released

Actual Result

Memory leak

Additional Context

No response

LGouellec commented 2 days ago

For tracking : https://github.com/LGouellec/streamiz/issues/361

cijothomas commented 2 days ago

meter?.Dispose(); meter = new Meter("Test");

This looks like a new Meter of same name is created right after disposing it? Not sure what is the intend here.

LGouellec commented 1 day ago

@cijothomas

meter?.Dispose(); meter = new Meter("Test");

It's just a reproducer, my initial use case is to expose OpenTelemetry metrics for a Kafka Streams application. Don't know if you are familiar with Kafka, but the assignment of partitions can change, so it means you have to drop some metrics in some cases.

Today, we can't drop a metric for a specific Meter instance. Open Issue here : https://github.com/dotnet/runtime/issues/83822 Discussion here : https://github.com/open-telemetry/opentelemetry-specification/issues/3062

So that's why for now, I dispose the current Meter instance, and recreate from scratch for each iteration. With that, I don't need to drop metrics, just not exposing the old metrics in the next iteration.

My point here is when the Meter instance is disposed, the open telemetry metrics are not released properly.

LGouellec commented 20 hours ago

@cijothomas

Is there a way in OpenTelemetry to clear and dispose all the resources without closing the MeterProvider ?

CodeBlanch commented 6 hours ago

Hey @LGouellec!

I spent some time on this tonight. Here is what I think is going on...

The problem is in your repro there you are using AddPrometheusHttpListener which is a pull exporter. No collection will run unless you cause one.

Do me a favor and switch your code to OtlpExporter and re-run the test. That is a push exporter meaning it will run periodically and cause a cleanup. I suspect if you run that the memory will be more stable.

Should we do something here? I'm not sure what we could do. Open to ideas. But I will say, what you are proving by constantly disposing Meters I don't think is how any of the APIs were designed to be used 😄