`OTEL_TRACES_SAMPLER` settings should apply in real time

open-telemetry / opentelemetry-dotnet

The OpenTelemetry .NET Client

https://opentelemetry.io

Apache License 2.0

3.21k stars 759 forks source link

`OTEL_TRACES_SAMPLER` settings should apply in real time #5737

Open julealgon opened 3 months ago

julealgon commented 3 months ago

Package

OpenTelemetry

Is your feature request related to a problem?

The OTEL_TRACES_SAMPLER and OTEL_TRACES_SAMPLER_ARG configuration settings are currently only taken into consideration when the TracerProviderSdk is first built, which only happens a single time in an application's lifetime.

This means that if one wants to change sampling percentage, or switch to a different sampler, an application restart is required for the settings to be picked up.

We have many applications today (dozens) that are leveraging OpenTelemetry and we want to fine tune sampler/percentage due to cost concerns with our observability tooling, but requiring a full restart of the applications is prohibitive for us as it results in downtime to consumers in most cases. This means we are currently forced to wait for deployment windows (which happen every 2 weeks or so by default in our process structure) to perform such fine tuning. This severely limits us and doesn't allow us to react to any problems quick enough.

What is the expected behavior?

We'd like for both of the OTEL_TRACES_SAMPLER and OTEL_TRACES_SAMPLER_ARG settings to be monitored for, and reapply in real time when changes occur: if the ARG setting changes, the current sampler should take it into account for any future sampling calls after that, and if the SAMPLER setting changes, it should complete any outstanding calls on the current sampler and swap to the new sampler as soon as possible.

Which alternative solutions or features have you considered?

We don't have any alternative at the moment. Only a full application restart triggers a new evaluation and application of the settings.

Technically, I believe it would be possible to achieve real-time switching of the sampler today from the outside by implementing this logic as part of a custom sampler decorator that relies on IOptionsMonitor, IOptionsSnapshot or change tokens to monitor for changes in the settings, but unfortunately the implementation today doesn't rely on the IOptions framework, so this would require quite a bit of custom code to support and would force us to basically reimplement the checks against these variables ourselves. We've not pursued that option yet, but will have to consider it in case there is pushback from the OTEL team in making it real-time natively.

Additional context

The OTEL specification doesn't explicitly mention whether or not the settings should apply immediately, but they also don't mention anything on the contrary.

I believe it is also unintuitive for the settings to require a restart today, as that is not mentioned anywhere. Sampling decisions are made on a per activity basis, so swapping the sampler should be doable.

Additionally, this goes against the design of metrics, which can be toggled on and off dynamically today via configuration and reflect instantaneously. Having tracing-related configs also reflect in real-time would be a consistency upgrade there.

FYI @erricrice

cijothomas commented 3 months ago

Not a direct answer to this issue, but sharing some related thoughts.

Based on what I understood from the issue description, most likely, what you need is a Sampler that "adapts" automatically i.e instead of accepting a fixed "ratio"/"percentage", it should accept something like "rate" (eg: 5 traces/sec max). And the sampler can internally monitor the actual volume, and adjust the ratio so that the overall volume is meeting the configured limit. This is already available in Collector, but not yet in this repo. It'd be good to get something like this in the SDK itself. (I think Java already have this. And such adaptive samplers were available in ApplicationInsights .NET SDKs too)
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#jaegerremotesampler is another one I've seen users leveraging for this/similar need. The sampling setting can be querying from a local/remote endpoint and sampler can react based on need. (like bump up ratio when investigating something, and then reduce it when issue is resolved). This is also not implemented in OTel .NET at all :(

julealgon commented 3 months ago

@cijothomas those options would be interesting indeed. They might not solve the issue for us necessarily but I could see we potentially leveraging them in some scenarios.

The Collector reference is something I forgot to mention. We are using an OpenTelemetry Collector in our infrastructure, but currently we are only doing sampling at the application level and not in the collector itself. For now, we've opted into keeping sampling in the apps as that is substantially more efficient than having everything pushed into the collector only for it to be ignored there.

CodeBlanch commented 1 month ago

Just FYI for anyone landing on this PR @samsp-msft contributed a RateLimitingSampler over on contrib.

julealgon commented 1 month ago

Just FYI for anyone landing on this PR @samsp-msft contributed a RateLimitingSampler over on contrib.

That's a great addition!

It would be cool if there were further ways to configure the rate limiting aspect. For example, having a means to prioritize or "isolate" categories that are rate limited independently of each other. As an example, I don't want to sample lots of success traces only to then have some error ones ignored by the rate limiting.

Ideally, it would be nice to be able to configure custom "categories" or "filters" that would group traces together and apply rate limiting to those groups in an independent manner.

Another common scenario I could see is regarding errors. Imagine I get a burst of the same error in quick succession, but among those "same" errors there are a few outliers of different, interesting errors that I'd really like to not be ignored. General rate limiting would not be aware of those differences and might just use all the duplicate errors to rate limit the other different errors from the stream.

TL;DR, there should be some sort of mechanism to decide whether a given trace is a "good representative" or not: if there are a bunch of nearly identical traces already, then it would be considered "not representative", whereas if it was a rare error, it would be considered a "good representative" and not be filtered out.

cijothomas commented 1 month ago

Such advanced sampling is generally not possible at OTel Sampler level, as it is head-based - i.e at the time sampling decision is made, it is unknown if the trace will succeed/fail/fail-terribly! Tail-sampling can be used to achieve such advanced capabilities. (There are pros/cons in doing it inside process vs doing it in a local collector/agent)

There are few example in the repo showing this advanced trail-sampling in the sdk itself. https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/docs/trace/tail-based-sampling-span-level

There is no OTel prescribed "correct" way - OTel offers various options, leaving it up to end users to figure out the right combination for their needs.

samsp-msft commented 1 month ago

@julealgon, as @cijothomas says, what you are looking for is tail sampling. However to be most effective, I believe its best implemented out of proc in something like the OTel collector. The reason is that by its nature, tracing is distributed. A decision in process A should be influenced by the decisions in processes B, C, D & E, and the retention or dropping of the trace is ideally a uniform decision and should affect the results from all the processes. Trying to coordinate that between them is not really possible today.

The solution is to use the OTel collector, send all the traces to it, and then have it apply rules for which traces to keep or discard.

The last time I looked, the rules for the collector were pretty flexible, but I was disappointed that there wasn't a coordinated story for dealing with log entries - IMHO you want to filter trace associated log entries in coordination with the traces, making the same retain or drop decision as for the traces.

julealgon commented 1 month ago

@julealgon, as @cijothomas says, what you are looking for is tail sampling. However to be most effective, I believe its best implemented out of proc in something like the OTel collector. The reason is that by its nature, tracing is distributed. A decision in process A should be influenced by the decisions in processes B, C, D & E, and the retention or dropping of the trace is ideally a uniform decision and should affect the results from all the processes. Trying to coordinate that between them is not really possible today.

The solution is to use the OTel collector, send all the traces to it, and then have it apply rules for which traces to keep or discard.

@samsp-msft since you brought this up, I do have a question (perhaps off-topic but I'll go for it anyways): wouldn't this approach to sampling be substantially less performant than sampling at the application level? It basically forces us to completely disable sampling in all apps, then send full payloads to the collector (including the network overhead and all that), only to then ignore a significant portion of those spans in the collector.

I've been a bit afraid of switching to collector-based sampling because of this fear, although we've not yet tried it. I suspect it will also make it much harder to control sampling percentages based on different projects, as we'd have to constantly update the collector's yaml config file to change sampling rules, instead of just changing an Azure AppConfiguration value to do so for a specific project without any deployment or code changes.

The last time I looked, the rules for the collector were pretty flexible, but I was disappointed that there wasn't a coordinated story for dealing with log entries - IMHO you want to filter trace associated log entries in coordination with the traces, making the same retain or drop decision as for the traces.

I assume you mean that the sampling processor is configured independently for traces and logs, right? For now, that wouldn't affect us particularly, since we don't intend to enforce any sampling for logs. But I do get what you mean.

martinjt commented 2 weeks ago

There are pros and cons for the different sampling techniques and you're hitting on some of them there.

In debugging there are 2 main types of data you want to keep/sample, slow/interesting requests and those with errors. Unfortunately neither of those are ones you can determine at root span creation time (I.e. head sampling).

Using an out of band sampler that buffers spans and makes the decision based on all the spans in the trace is the best way to do this (i.e. tail sampling), but it has drawbacks.

The biggest is memory, in order to tail sample, you need to buffer a lot of data and therefore store a lot in memory. This means the samplers can be large.

The next is bandwidth, you need to send all spans to the same instance. In AWS, cross-AZ transfer isn't free, so that can get very costly. In Azure that's free, but cross-region is not. So architecture this right is crucial (I've written posts about that).

The final one is the delay. In order to get all the spans in the trace to make the decision across all services, you need to wait for that. Because this is generally non-deterministic, you end up with 15-30 second delays. In platforms like AppInsights that have a 90 second ingest SLA, that can be mostly non-noticeable, in the backends with seconds ingest it's a potential issue.

Head sampling is generally used in combination with tail sampling at the larger end of traffic. We have a customer that's currently sampling at 1/20 at the head, then 1/200000 at the tail for instance because their volumes are that high.

Depending on your scale, and how you do tail sampling (rules based like the collector or more dynamic/adaptive) you can get statistical accuracy with much smaller volumes, but that does require tail sampling.

I think ultimately, this isn't something that we would implement in .NET, so I think this issue should probably be closed out.

julealgon commented 2 weeks ago

@martinjt I was with you the whole time until this very last statement.

I think ultimately, this isn't something that we would implement in .NET, so I think this issue should probably be closed out.

I don't see how tail vs head sampling has anything to do with this proposal here. I would still want to be able to configure head sampling dynamically regardless of all the other points in the discussion. The fact that I have to restart my applications today for the sampler to take effect is less than ideal, particularly when I just want to change the sampling percentage.

martinjt commented 2 weeks ago

I'm not sure how you would change environment variables of a running process to be honest, is that actually possible?

The samplers that are part of the otel sdk don't suggest this is something that you can do, so I don't think that it's something we would attempt in core.

For contrib, I could see someone creating something a bit like the Jaeger remote sampler that implements this kind of functionality. What I'm saying is that it wouldn't be part of core.

julealgon commented 2 weeks ago

I'm not sure how you would change environment variables of a running process to be honest, is that actually possible?

Settings on the .NET implementation are also honored via the IConfiguration subsystem. We currently rely on Azure AppConfiguration for all of our shared Open Telemetry settings.

samsp-msft commented 2 weeks ago

I did add a version of the Jaeger rate based sampler to contrib - https://github.com/open-telemetry/opentelemetry-dotnet-contrib/tree/main/src/OpenTelemetry.Extensions#ratelimitingsampler.

This is typically used with the parent-based sampler to set the Recorded bit for whether a span should be emitted or not. That flows through the chain of Activitys and acts as a head-based sampler. Activities are created for each request in each process and httpclient will propagate the trace state with the recorded flag turned off. Intermediary Activities are not created, so the majority of tracing work is skipped for those requests.

It takes a rate at startup, but could easily be modified so that you could adjust the rate dynamically, or go a step further and have it pull the value from config and listen to config changes.

martinjt commented 2 weeks ago

The bigger issue I think is that it would deviate from other implementation in other languages/frameworks.

I know that .NET has already deviated in a lot of ways IConfiguration support is definitely an interesting one, but it's kept true to treating it the same as the other implementations.

I think that adding in auto-changing this based on the standard settings would be an interesting step to take, but ultimately a deviation and something that would probably take a lot of work to build and maintain.

There is nothing stopping someone adding it to contrib though, as a "auto updating" version of the same code.

julealgon commented 2 weeks ago

I think that adding in auto-changing this based on the standard settings would be an interesting step to take, but ultimately a deviation and something that would probably take a lot of work to build and maintain.

I don't think it would be as complex as you imply there. Basically, it would need to have some lifetimes changed a bit and use IOptionsSnapshot (or IOptionsMonitor) to grab the settings instead of IOptions.

Of course, all up to the team. I may be able to contribute but that's not guaranteed.

martinjt commented 2 weeks ago

My worry would be that this is the hotpath, and massively multi-threaded, but further to that, considering the developer experience.

From the top of my head, it would be whether this is what someone actually expects if they come from other languages, and whether there's a consistency issue across multiple service (I.e. how you'd monitor whether it's propagated to everywhere if you're using key vault, vault etc.)

I think there's probably a lot to consider since it's deviating from the norm.

My point is, if the maintainers won't accept the responsibility to accept this here, we should close this down. However building a variant in contrib that you maintain is probably valuable.

julealgon commented 2 weeks ago

My worry would be that this is the hotpath, and massively multi-threaded...

This is a totally fair concern. There's been complains in the runtime repo about poor performance of IOptionsSnapshot and the .NET team is aware of that.

From the top of my head, it would be whether this is what someone actually expects if they come from other languages

I don't think this should be a concern. .NET has its own mechanisms that the community is used to, namely in this case IConfiguration. This is why all OTEL .NET libraries were made to support it in the first place. I think the confusion would be the other way around: if OTEL didn't honor standard .NET systems and only used raw environment variables like other languages do. It would be doing .NET a disservice IMHO.

...and whether there's a consistency issue across multiple service (I.e. how you'd monitor whether it's propagated to everywhere if you're using key vault, vault etc.)

Now you are listing a separate concern that is independent from OpenTelemetry IMHO. Anyone who consumes settings via IConfiguration is succeptible to configuration reload and have the option to tap into that or not using IOptionsSnapshot/IOptionsMonitor and IOptions respectively. Even the default appsettings.json configuration supports live setting changes by default, so this is well-understood and expected behavior.

I don't think this should be a concern at all to a specific usage/library. It's up to the consumer how/if they want to leverage configuration reload and each IConfiguration provider has its own mechanism to do that, including Azure AppConfiguration (which actually has multiple strategies to handle it including polling and event-based methods).

I think there's probably a lot to consider since it's deviating from the norm.

I don't agree it "deviates from the norm". It would be fully idiomatic .NET. It might "deviate" from how it is handled in another language, but that's because that other language perhaps doesn't have a widely used, native dynamic configuration mechanism that supports setting reloads without restarting the application. This to me again is only taking advantage of what .NET has to offer.

My point is, if the maintainers won't accept the responsibility to accept this here, we should close this down. However building a variant in contrib that you maintain is probably valuable.

Fair. I think it would still make sense as part of the standard implementations to take configuration reloading into account, but this is a subjective matter for sure.

martinjt commented 2 weeks ago

Your points are valid if we consider this as a purely .NET solution, used by developers who work in one language. It becomes more nuanced when you start to think more about how this would differ in a polyglot system where there are multiple entry points across .net, python, js etc. When .NET's configuration updates, but none of the others do.

To be clear, those were thoughts off the top of my head as just some initial thoughts that would need to be considered if this was to be taken forward.

In regards to being idiomatic for C# devs, that's a thing I've always got behind. I would definitely argue that configuration updating unchange is not in anyway an idiomatic .net behaviour, for over(?) 2 decades changing config files caused application restarts, nevermind it not updating the inmemory representation, and most libraries and applications I've encountered don't implement IOptionsMonitor and most devs I speak to don't know what it does as an interface anyway.

Further to this, I think the idea of changing things while they're running may become part of the Otel configuration file working group (perhaps even part of OpAmp). That will likely cover whether updating samplers should happen without app restarts.

All that to say that unless it's going to be accepted by the maintainers, it's a fruitless debate. If it is, then I'll happily spend thought on what potential gotchas there might be.