open-telemetry / opentelemetry-dotnet-contrib

This repository contains set of components extending functionality of the OpenTelemetry .NET SDK. Instrumentation libraries, exporters, and other components can find their home here.
https://opentelemetry.io
Apache License 2.0
473 stars 282 forks source link

Rate Limiting sampler for .NET #1996

Closed samsp-msft closed 3 months ago

samsp-msft commented 3 months ago

Port of https://github.com/open-telemetry/opentelemetry-java/blob/main/sdk-extensions/jaeger-remote-sampler/src/main/java/io/opentelemetry/sdk/extension/trace/jaeger/sampler/RateLimitingSampler.java to .NET

The rate limiting sampler is a sampler that will limit the number of traces to the specified rate per second. It is typically used in conjunction with the ParentBasedSampler to ensure that the rate limiting sampler is only applied to the root spans. If a request comes in without a sampling decision, the rate limiting sampler will make a decision based on the rate limit. The sampling decision is stored in the activity context, via the Recorded property, and is passed along with calls to other services called with HttpClient so the same decision is used for the entire trace.

Example of RateLimitingSampler usage:

builder.Services.AddOpenTelemetry()
    .WithTracing(tracing =>
    {
        tracing.AddAspNetCoreInstrumentation()
            .AddHttpClientInstrumentation()
            // Add the rate limiting sampler with a limit of 3 traces per second
            .SetSampler(new ParentBasedSampler(new RateLimitingSampler(3)))
    });

Changes

Please provide a brief description of the changes here.

Merge requirement checklist

samsp-msft commented 3 months ago

redo of #1967

codecov[bot] commented 3 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 88.57%. Comparing base (71655ce) to head (fccfe25). Report is 386 commits behind head on main.

Additional details and impacted files [![Impacted file tree graph](https://app.codecov.io/gh/open-telemetry/opentelemetry-dotnet-contrib/pull/1996/graphs/tree.svg?width=650&height=150&src=pr&token=DG2DEROH83&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=open-telemetry)](https://app.codecov.io/gh/open-telemetry/opentelemetry-dotnet-contrib/pull/1996?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=open-telemetry) ```diff @@ Coverage Diff @@ ## main #1996 +/- ## =========================================== + Coverage 73.91% 88.57% +14.65% =========================================== Files 267 10 -257 Lines 9615 175 -9440 =========================================== - Hits 7107 155 -6952 + Misses 2508 20 -2488 ``` | [Flag](https://app.codecov.io/gh/open-telemetry/opentelemetry-dotnet-contrib/pull/1996/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=open-telemetry) | Coverage Δ | | |---|---|---| | [unittests-Extensions](https://app.codecov.io/gh/open-telemetry/opentelemetry-dotnet-contrib/pull/1996/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=open-telemetry) | `88.57% <100.00%> (?)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=open-telemetry#carryforward-flags-in-the-pull-request-comment) to find out more. | [Files](https://app.codecov.io/gh/open-telemetry/opentelemetry-dotnet-contrib/pull/1996?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=open-telemetry) | Coverage Δ | | |---|---|---| | [...c/OpenTelemetry.Extensions/Internal/RateLimiter.cs](https://app.codecov.io/gh/open-telemetry/opentelemetry-dotnet-contrib/pull/1996?src=pr&el=tree&filepath=src%2FOpenTelemetry.Extensions%2FInternal%2FRateLimiter.cs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=open-telemetry#diff-c3JjL09wZW5UZWxlbWV0cnkuRXh0ZW5zaW9ucy9JbnRlcm5hbC9SYXRlTGltaXRlci5jcw==) | `100.00% <100.00%> (ø)` | | | [...nTelemetry.Extensions/Trace/RateLimitingSampler.cs](https://app.codecov.io/gh/open-telemetry/opentelemetry-dotnet-contrib/pull/1996?src=pr&el=tree&filepath=src%2FOpenTelemetry.Extensions%2FTrace%2FRateLimitingSampler.cs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=open-telemetry#diff-c3JjL09wZW5UZWxlbWV0cnkuRXh0ZW5zaW9ucy9UcmFjZS9SYXRlTGltaXRpbmdTYW1wbGVyLmNz) | `100.00% <100.00%> (ø)` | | ... and [268 files with indirect coverage changes](https://app.codecov.io/gh/open-telemetry/opentelemetry-dotnet-contrib/pull/1996/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=open-telemetry)
samsp-msft commented 3 months ago

I can see this is a direct copy of what's in Java and the rudimentary tests show it to work (on windows), but I don't understand the logic to know how this is meant to work.

Also, we'll need to figure out why the tests are failing on ubuntu.

I did some more experimentation, including running in WSL (Ubuntu). I had not accounted for the initial balance in the math in the test - with that, the first N asks will be sampled in as they chew through that balance. With this accounted for the expected value and actual are exact in my personal experiments, but I want to keep a small fudge factor as tests that rely on timing don't always work out the way you expect them to.

samsp-msft commented 3 months ago

@cijothomas the ubuntu run should pass tests now.

samsp-msft commented 3 months ago

@cijothomas - poke 😸