open-telemetry / semantic-conventions

Defines standards for generating consistent, accessible telemetry across a variety of domains
Apache License 2.0
251 stars 163 forks source link

Problematic processing of optional metric dimensions #492

Open klauco opened 10 months ago

klauco commented 10 months ago

HTTP semantic conventions for metrics are introducing a concept of optional attributes, present only when the underlying value is available, for example status code when the response was sent back with status code.

In order to reduce telemetry COGS, some of the telemetry backends might perform additional post-processing of the metrics and store only time-based aggregations of a specific sets of attributes, on top of which you define additional visualizations or monitoring or do some extra processing or calculations. These sets of a attributes don't expect to have attributes present optionally. Raw metric records ingested to the telemetry backend are not stored.

In general, I am wondering:

  1. what is the current support of optional attributes amongst the most most used telemetry backends? Do they allow optional attributes in metrics? Do i.e. Prometheus rules work well with optional dimensions?
  1. is the concept of optional dimensions considered user friendly? Users must be aware of the optional dimensions and count with having 1 to N of them missing, every time they are working with a given metric - either visualizing, setting up monitoring etc. This in turn will make the monitoring / dashboarding logic more complicated. The case of status code missing (not sent as part of the response) is likely a rare case (considering the total volume of requests which will be sent back with the response), but a case which can happen. Assuming it is rare, the need to deal with it as an optional attribute might be unnecessary burden put on the OTel users.

  2. is there any way to make processing of optional attributes easier, i.e. by allowing default values, instead of not adding that attribute at all?

Thanks for the response.

jsuereth commented 10 months ago

We discussed this a bit in the Semconv WG. A few points:

  1. Prometheus, and prometheus-like databases, tend to handle disjoint metric definitions by synthesizing missing labels/attributes. Effectively, any label it's not aware of becomes an empty label. It is considered good practice to provide all labels all the time.
  2. You still need to deal with an issue where multiple versions of a binary running in production may have different ideas of a metric definition. This is why prometheus is flexible and most prometheus-like databases provide flexibility.

That said, I do think having a default value in semconv for attributes that are optional is entirely reasonable feature request. We likely need some specification work to understand what value we want for things like attributes that are integers to preserve the existing prometheus-like storage behavior of turning them into empty labels.