Open lmolkova opened 1 year ago
Thanks for creating the issue @lmolkova ! Some context on why I commented that in the PR:
I often speak with previous colleagues I worked with while I was a "full-time back-end developer". I ask them to try OTel, tell me their pains and their general idea of the spec and etc. One thing that always comes up is cardinality. None of them had much idea what it was and even worse, how they know the things they are instrumenting/recording are suffering from high cardinality.
Plus, during the messaging SIG meetings, the topic of high-cardinality has come up multiple times, for ex where we discussed span names and what to use for it. I remember we going through the usual "can't use this because it's high-cardinality" and then immediately after, people asking but why not? Why/where is the problem with x approach?
I thought about it and have some ideas, so I will just "dump" them here. What I thought would be either a complete new page for it or a section somewhere (e.g., glossary) with a structure like this:
Goals:
Explain "Cardinality" in a general and "easy to grasp" way. For ex, I found this one for SQL well structured and maybe we could take some ideas from it https://en.wikipedia.org/wiki/Cardinality_(SQL_statements).
I would try to refrain from using complex, mathematical definitions as that doesn't help newcomers understand it.
Goals:
Explain what having high cardinality will cause for users in the end. With clear and easy to understand examples. For ex, their queries/dashboards will provide less useful output, they will have high costs etc.
Goals: Explain with examples why it's a problem for traces
Goals: Explain with examples why it's a problem for metrics
Goals:
Here we can give best-practices on how to achieve this. For example, mentioning one should consider using bounded values for attributes (categories, enums). Again, the goal is to provide guidance with easy-to-understand language and with as much of real world examples as possible, so folks actually using OTel and adding instrumentation have a solid foundation to base their instrumentation from
Curious to see what the community think about this. :)
Related: https://github.com/open-telemetry/semantic-conventions/pull/205#discussion_r1290760109
Low cardinality requirements apply to collection and storage, but query-time cardinality could also be important for user experience (for example, http.route
has low-ish cardinality within one service, but could be much higher across all services in the system).
The TAG Observability white paper has definitions/explanations of metric cardinality https://github.com/cncf/tag-observability/blob/whitepaper-v1.0.0/whitepaper.md#metric-cardinality. Maybe we could borrow things from there, to finally fix this?
What are you trying to achieve?
Currently, we recommend using low-cardinality span names in all trace conventions.
It would be great to have a definition of cardinality and the idea of what low and high mean so we can refer to it from different semantic conventions.
Additional context.
It's partially explained today in metrics supplemental guidelines and trace API
_Originally posted by @joaopgrassi in https://github.com/open-telemetry/opentelemetry-specification/pull/2957#discussion_r1030505015_