Define low/high-cardinality

lmolkova commented 1 year ago

What are you trying to achieve?

Currently, we recommend using low-cardinality span names in all trace conventions.

It would be great to have a definition of cardinality and the idea of what low and high mean so we can refer to it from different semantic conventions.

Additional context.

It's partially explained today in metrics supplemental guidelines and trace API

    Now thinking (not related but), "low/high" cardinality is a topic that comes often and not everyone understands it. Would be cool if the spec defined this "once and for all".

_Originally posted by @joaopgrassi in https://github.com/open-telemetry/opentelemetry-specification/pull/2957#discussion_r1030505015_

joaopgrassi commented 1 year ago

Thanks for creating the issue @lmolkova ! Some context on why I commented that in the PR:

I often speak with previous colleagues I worked with while I was a "full-time back-end developer". I ask them to try OTel, tell me their pains and their general idea of the spec and etc. One thing that always comes up is cardinality. None of them had much idea what it was and even worse, how they know the things they are instrumenting/recording are suffering from high cardinality.

Plus, during the messaging SIG meetings, the topic of high-cardinality has come up multiple times, for ex where we discussed span names and what to use for it. I remember we going through the usual "can't use this because it's high-cardinality" and then immediately after, people asking but why not? Why/where is the problem with x approach?

I thought about it and have some ideas, so I will just "dump" them here. What I thought would be either a complete new page for it or a section somewhere (e.g., glossary) with a structure like this:

Cardinality

Goals:

Explain "Cardinality" in a general and "easy to grasp" way. For ex, I found this one for SQL well structured and maybe we could take some ideas from it https://en.wikipedia.org/wiki/Cardinality_(SQL_statements).

I would try to refrain from using complex, mathematical definitions as that doesn't help newcomers understand it.

Why high-cardinality is a problem?

Goals:

Explain what having high cardinality will cause for users in the end. With clear and easy to understand examples. For ex, their queries/dashboards will provide less useful output, they will have high costs etc.

High-cardinality in traces

Goals: Explain with examples why it's a problem for traces

High-cardinality in metrics

Goals: Explain with examples why it's a problem for metrics

How do I achieve low-cardinality

Goals:

Here we can give best-practices on how to achieve this. For example, mentioning one should consider using bounded values for attributes (categories, enums). Again, the goal is to provide guidance with easy-to-understand language and with as much of real world examples as possible, so folks actually using OTel and adding instrumentation have a solid foundation to base their instrumentation from

Curious to see what the community think about this. :)

lmolkova commented 1 year ago

Low cardinality requirements apply to collection and storage, but query-time cardinality could also be important for user experience (for example, http.route has low-ish cardinality within one service, but could be much higher across all services in the system).

joaopgrassi commented 11 months ago

The TAG Observability white paper has definitions/explanations of metric cardinality https://github.com/cncf/tag-observability/blob/whitepaper-v1.0.0/whitepaper.md#metric-cardinality. Maybe we could borrow things from there, to finally fix this?

open-telemetry / opentelemetry-specification