open-telemetry / opentelemetry-dotnet

The OpenTelemetry .NET Client
https://opentelemetry.io
Apache License 2.0
3.19k stars 754 forks source link

Promote cardinality limit view API from experimental to stable #5444

Open CodeBlanch opened 6 months ago

CodeBlanch commented 6 months ago

We have an experimental api now for setting cardinality limit via the view api (see: #5312 & #5328).

This issue is for tracking making this a stable feature.

NickCraver commented 4 months ago

@CodeBlanch is this likely to make it into an upcoming release? We have severe memory issues with metrics due to needing high cardinality (for compatibility) on a single metric. Unless I'm mistaken, there's not a per-metric way to set higher cardinality until this lands and no current packages expose it, so we're in a tough spot on options here. Knowing where this stands would help me figure out timelines on options to unblock the next release for App Service.

cijothomas commented 4 months ago

@CodeBlanch This should require no spec support, given we already support cardinality limit on MP, so adding it per Metric should be continuation of that. The current default behavior of drop-when-limit can be continued. For folding overflow into the special overflow bucket - requires spec change.

NickCraver commented 1 month ago

@CodeBlanch @cijothomas Checking in since the global limit is costing us an extra 11 terabytes of memory at all times. When do we expect to land this at a more granular level? Or is that already the case and this issue is behind current status?

cijothomas commented 3 weeks ago

@NickCraver The status is still the same, the API is not part of stable release yet. Unfortunately, no firm ETA for that as well, given there is some spec dependency (though it can be argued that the spec dependency can be ignored if the overflow behavior is just drop).

cijothomas commented 3 weeks ago

https://github.com/open-telemetry/opentelemetry-specification/issues/3904 Spec issue tracking this.

NickCraver commented 3 weeks ago

@cijothomas I could be misunderstanding the current state - isn't the behavior currently/already drop and log? I'm trying to understand how the spec for what to do when exceeding the limit is a blocker for setting the limit, can't the latter come before, with the former being further enhancement or refinement? It seems like they're related but orthogonal concerns we could parallel.

cijothomas commented 3 weeks ago

You are right! Current: The current default behavior is to drop when limit is exceeded. The limit is allowed to be configured only on Provider level. Experimental: The experimental-spec is to "don't drop when limit exceeds, but put into overflow=true bucket". -- This is experimental in OTel .NET too, enabled via OTEL_DOTNET_EXPERIMENTAL_METRICS_EMIT_OVERFLOW_ATTRIBUTE=true

Also experimental is the ability to set cardinality on a per metric basis (using Views). This can be stabilized with the default behavior of dropping, and opt-in behavior to put into overflow bucket. Once spec is stabilized, this can be changed to default to overflow. But I believe the desire (based on what I gather from a community call few months ago!) is to do both together, to make the transition smoother.