Split collection limit out of cardinality limit

MrAlias commented 9 months ago

As discussed in the last specification SIG meeting (2024-01-09) the existing cardinality limit is being used to represent two different limits:

The maximum number of measurements made for distinct attribute sets within a collection cycle
The maximum number of time-series exported at the end of the collection cycle

This distinction is meaningful for a few reasons.

Users may not want these to be the same value. One applies to the system telemetry is being produced on, and the other applies to the downstream telemetry transmission and storage systems.
The produced telemetry will differ base on how this limit is applied in relation to any attribute filtering.
A user trying to remediate the limit (1) being exceeded using an attribute filter may not be able to if the implementation is filtering at the end of the collection cycle.

Proposal

Introduce the new "collection limit" to directly set the maximum number of measurements allowed for distinct attribute sets within a collection cycle (1)
Include recommendations for implementations to document that users should resolve "collection limit" scenarios using the instrument attribute advisory parameter
Refine the definition of "cardinality limit" to only be the maximum number of time-series exported at the end of the collection cycle (2)
Include recommendations for implementations to document that users should resolve "cardinality limit" scenarios using the instrument attribute advisory parameter or with an attribute filter on a view.

cc @trask @jmacd @jack-berg @jsuereth

jmacd commented 8 months ago

I think I agree with this proposal. The Lightstep metrics SDK which I used for prototyping does have two limits that can be roughly described as @MrAlias has described above.

We might disagree on what "within a collection cycle" means. In my implementation, this "interior" cardinality limit is enforced between any two collection cycles by any two Readers. So -- and I admit this is not very intuitive -- when the interior limit is being reached, one way to address this is for the user to add another Reader with a shorter collection cycle. This will push the cardinality out of the interior data structure into each reader sooner, at which point the per-reader collection limit is well defined.

@utpilla looking for input.

jack-berg commented 1 month ago

Introduce the new "collection limit" to directly set the maximum number of measurements allowed for distinct attribute sets within a collection cycle (1)

If I understand this correctly, this is saying for a given attribute set (i.e. {"shape": "square", "color":"red"}), limit the max number of measurements to some configurable value. Who is asking for this? What's the goal of this?

MrAlias commented 1 month ago

Who is asking for this?

@MrAlias

What's the goal of this?

Is there something unclear with the description?

trask commented 1 month ago

we may want to revisit this issue after #3798 is resolved since seems to be some interdependency between them

cijothomas commented 1 month ago

Introduce the new "collection limit" to directly set the maximum number of measurements allowed for distinct attribute sets within a collection cycle (1)

If I understand this correctly, this is saying for a given attribute set (i.e. {"shape": "square", "color":"red"}), limit the max number of measurements to some configurable value. Who is asking for this? What's the goal of this?

+1

I am confused too... Why do we want to limit the number of measurements allowed for this?

for i=0; i<100; i++)
{
 counter.add(1, shape=square,color=red);
}

// Are we saying we need to have a limit on the number of measurements allowed ? i.e if I have 100 measurements, and limit is 90, then what happens to the other 10 measurements? What is the need of this limiting? Isn't the whole point of Metrics is that output is of fixed, predictable size, so if user has 100 measurements or a million of them, the output is still predictable size.

It may be the case that we didn't understand each other. @MrAlias Can you clarify if my (and Jack's) understanding is correct?

Also https://github.com/open-telemetry/opentelemetry-specification/pull/3856 has clarified that the cardinality limit in the spec today is 2 from this issue. "Refine the definition of "cardinality limit" to only be the maximum number of time-series exported at the end of the collection cycle (2)"

MrAlias commented 1 month ago

I have 100 measurements, and limit is 90

Note from the description:

The maximum number of measurements made for distinct attribute sets within a collection cycle

Which means that if you made 100 measurements for distinct attribute sets, yes you would limit to 90. If you make 100 measurements for the same attribute set you would measure all 100.

This assumes filtering is done in the collection phase.

cijothomas commented 1 month ago

@MrAlias Given https://github.com/open-telemetry/opentelemetry-specification/pull/3856 and https://github.com/open-telemetry/opentelemetry-specification/issues/3798 Can you re-assess if this is still required? If yes, do you consider this blocking the stabilization?

MrAlias commented 3 weeks ago

I think this can be postponed until after stabilization of the cardinality limit if we can decide on possible naming.

It seems like what we are after is a "hard" and "soft" limit definition. The "hard" limit would be the one that is never exceeded, even during the measurement phase, and the "soft" limit is the current definition that may be exceeded if filtering is not done in the measurement phase.

If we want to name them as such we should consider renaming the existing cardinality limit to be the cardinality soft limit.

reyang commented 3 weeks ago

It seems like what we are after is a "hard" and "soft" limit definition. The "hard" limit would be the one that is never exceeded, even during the measurement phase, and the "soft" limit is the current definition that may be exceeded if filtering is not done in the measurement phase.

If we need to have this distinction, I think we can use the same name but apply it at different components/levels. This is similar to throttling; we can have the same "Request per second" throttling mechanism at various levels (e.g. for each endpoint, for each binding address, for each client IP address, etc.)

open-telemetry / opentelemetry-specification

Split collection limit out of cardinality limit #3813

Proposal