open-telemetry / opentelemetry-specification

Specifications for OpenTelemetry
https://opentelemetry.io
Apache License 2.0
3.71k stars 887 forks source link

Does OpenTelemetry need "enum" metric type? #1711

Open jsuereth opened 3 years ago

jsuereth commented 3 years ago

How are users expected to encode 'enumerated' values in OpenTelemetry? Could be a gauge w/ string value?

We need to figure this out.

sirianni commented 3 years ago

+1 for adding first-class support for enumerated values (or a String-valued gauge).

I have some internal customers trying to model this in their applications. Currently they are setting the enumerated value as an attribute and setting the metric value to 0 or 1 depending on which enum value is active. One of the downsides of the "value-as-attribute" approach is that it bloats the volume of metrics flowing through the pipeline (in each collection interval, all but one of the metrics have a 0 value). We are getting pushback from the teams that own our metrics sinks and being asked to minimize this volume.

An alternative pattern would be to encode each enumerated value as a numeric integer. This has the downside that the metric is not self-describing and you'd need additional context to decode the integer value in the sink.

In OpenMetrics, enums can be encoded using the more general StateSet type which uses a bit-vector to encode the values. StateSets represent a series of related boolean values, also called a bitset.

If encoded as a StateSet, ENUMs MUST have exactly one Boolean which is true within a MetricPoint. This is suitable where the enum value changes over time, and the number of States isn't much more than a handful.

bertysentry commented 2 years ago

+1

We're developing a hardware monitoring solution in OpenTelemetry and many of the components report a status that is an enumeration (e.g. simply "OK", "Degraded", or "Failed").

Using the "value-as-attribute" approach is non-sense when the enumeration has many possible values.

A simple implementation would be to map each string value to an integer value and describe this mapping in the unit in a standard way. Example: { 0 = OK; 1 = Degraded; 2 = Failed }

This way the current specification and implementation of metrics don't need to change much and it's the UI's responsibility to translate the integer value into the corresponding string value (very much like it's already the UI's responsibility to convert bytes to GB to display something that the end user can read easily).

dyladan commented 2 years ago

A simple implementation would be to map each string value to an integer value and describe this mapping in the unit in a standard way. Example: { 0 = OK; 1 = Degraded; 2 = Failed }

Keep in mind that the unit field is limited to 63 characters which is extremely limiting for an enum like this. Also, this not-quite-json would require a custom parser if a backend wanted to display it more clearly and a more machine readable true JSON format would be more annoying for users of systems that don't do any parsing.

If we don't want to have a proper enum type, I would argue that a simple counter/gauge/etc with attribute status=OK, status=DEGRADED, or status=FAILED would be a more elegant solution.

bertysentry commented 2 years ago

I agree that the { 0 = OK; 1 = Degraded; 2 = Failed } unit suggestion is more of a workaround than a proper enum implementation, as it defers the implementation of the conversion to the UI, with loose specifications. Also, good point on the 63 chars limit in the Unit field.

Using an Attribute is not ideal either. With a long Enum (say like the very common OperatingStatus property of the CIM_ManagedSystemElement WBEM class), you would need to send:

managed_system_element.operating_status{state="Unknown"} 0.0
managed_system_element.operating_status{state="Not Available"} 0.0
managed_system_element.operating_status{state="Servicing"} 1.0
managed_system_element.operating_status{state="Starting"} 0.0
managed_system_element.operating_status{state="Stopping"} 0.0
managed_system_element.operating_status{state="Stopped"} 0.0
managed_system_element.operating_status{state="Aborted"} 0.0
# and many more
managed_system_element.operating_status{state="Vendor Reserved"} 0.0

...just to say that the managed system element is "Servicing". In terms of storage, memory and network, it's very inefficient.

Also, it allows invalid states where several values are "enabled" at once, while only one is possible at the same time (the service can't be both "Starting" and "Stopping" at the same time.

DMTF's CIM model has had ValueMap from the beginning (see page 89 - ValueMap in CIM specifications).

Heck, SNMP has had enumeration types since for more than a couple decades. See RFC-2579 which defines textual conventions syntax:

INTEGER {
                     other(1),       -- eh?
                     ok(2),    -- everything is fine
                     degraded(3), -- still working, but for how long?
                     failed(4),   -- no longer working
                     dead(5)     -- you won't fix this one
                 }

My conclusion is:

  1. We should use an Attribute to identify various states of an enum when they can be combined (like the StateSet in OpenMetrics)
  2. We need an Enum value type (where values are mutually exclusive) for metrics in OpenTelemetry that specifies a mapping from int to string, just like SNMP and DMTF's CIC were doing
  3. It will be the exporter's responsibility to export the mapping information to the receiving platform (as a unit field, as a description, as a comment, or as a true enum).

Note: In the meantime, assuming OpenTelemetry adds an Enum value type in the future, I would recommend that developers use a single Gauge int metric to represent mutually exclusive enums, as it will be much easier to convert later a single int metric to an Enum.

pirgeo commented 2 years ago

Thanks for elaborating on this. The example you have given above feels a bit more like an Event to me (implying a state transition). Maybe there is a entirely different way of modeling this that does not necessarily involve metrics if performance is critical?

Some considerations regarding using a gauge:

It will be the exporter's responsibility to export the mapping information to the receiving platform (as a unit field, as a description, as a comment, or as a true enum).

I am not sure I understand this. Are you suggesting that the int-to-string mapping information needs to be transferred with the enum data type during metrics message export? If this is the case, I am not sure if sending the different data points as labels would really increase the size of the message a lot, since we transfer all of the strings anyway (just already separated instead of as a description that needs to be parsed by the backend), and I think adding a few more integers would not really increase the relative size of the message a lot. Also: can the enum definition change over time? What happens if different services report different mappings - which one is given priority?

bertysentry commented 2 years ago

Sorry for the late reply, I did not receive the notification... Thank you for the questions and comments!

The example you have given above feels a bit more like an Event to me

You're right, metrics or events can be used to report "discrete" values (the status of a component for example), they both have pros and cons. In the end, you still need a proper Enum value to represent the current status (metric) or the new status (event).

How would you aggregate these gauges? The UI needs to know that this is not actually a gauge, but an enum, and act accordingly.

Grafana Dashboards already has a "State timeline" panel designed just for that. The mapping in this case needs to be specified in the UI. If the metric itself could indicate the mapping in the future, that would make everybody's life much easier.

The timeseries database storing the metrics doesn't need to handle Enum metrics as a specific type. They can process them just as simple integers. The value mapping could be stored as metadata. Spatial aggregation should prefer max() and min() functions, rather than average().

using a gauge for now and later switching to an enum data type (if it is introduced) would break data continuity in OTLP

True. To avoid this, an enum type could be defined as a subtype of Gauge, like EnumGauge. The only difference between EnumGauge and a simple Gauge is the value mapping (instead of the Unit field). Not sure this is feasible in Go.

Also, breaking data continuity doesn't really happen in OTLP itself, but rather in the exporters. If the Prometheus exporter handled EnumGauge just like Gauge (as numbers), it wouldn't make any difference in Prometheus (no break in data continuity).

The same cannot be said if we have Attributes describing the states. They end up as different labels in Prometheus and thus different timeseries.

It will be the exporter's responsibility to export the mapping information to the receiving platform (as a unit field, as a description, as a comment, or as a true enum).

I am not sure I understand this.

I'm simply saying that it's the exporter's responsibility to adapt the OpenTelemetry metrics to the receiving end. In the example of Prometheus, which doesn't support Enum values, the exporter would simply export the int value, and maybe the mapping in the metric description (in # comments when polling /metrics).

If a timeseries backend supports Enum values natively in the future, the corresponding exporter may have the option to export OTLP's EnumGauge as the corresponding native Enum.

Also: can the enum definition change over time? What happens if different services report different mappings - which one is given priority?

Good question! The value mapping of an Enum must not change over time, very much like the unit of a metric is not supposed to change over time. EnumGauge DataPoints can be send in bulk, without resending the value mapping (just like we don't need to resend the Gauge unit for each DataPoint).

bertysentry commented 2 years ago

As a matter of fact (if I'm not mistaken 😅), Grafana doesn't have any visualization that can display StateSet metrics, where the different states are represented with an Attribute (label). See this question on StackOverflow links to a 3rd-party plugin to do that (but it's not made by Grafana).

I haven't seen anything that could handle this on Datadog either.

pirgeo commented 2 years ago

Thanks for your reply. I think I see your point about the enum-as-gauge (and please correct me if I am wrong):

It seems that using attributes could lead to (potentially unwanted) StateSet semantics, and it leads to having to export zeros for states that the device is currently not in (at least the ones that were already seen before). Using the Gauge is an efficient, but much less user friendly alternative. Also, using a Gauge currently does not allow adding the mapping on the client, since there is no way of transmitting that mapping. One would have to add it on the server side manually (as is the case in Grafana, if I understand correctly). Am I missing something here?

The upside of using attributes would be that you could easily sum the counts from different producers and see stats like "how many disks are in a running state, how many disks fail on average per hour?". This is possible today and would not require any changes to the protocol. I think I am more familiar with this representation, because it better resembles my understanding of metrics (being able to "do math" with them), while this is not the case for the gauge representation.

Either way, I don't really see a way around breaking semantic conventions when introducing an Enum type later. If I am not mistaken, there is no inheritance in Protobuf, so in OTLP the Enum would have to be a separate type. That means that unless the consumer decides to explicitly convert OTLP Enums to whatever internal representation they used before, there will be two separate data streams. I see your point about Prometheus in that regard, and that it should be possible to do that easily there, but I am not sure about a lot of the other consumers.

bertysentry commented 2 years ago

I understand your point regarding the ability to do math with the metrics, where the states described with an attribute is much more user friendly.

Also at this point, I think it is unlikely we will see timeseries server vendors embrace a new Enum type. Even though I think I have a ton of excellent reasons to want this EnumGauge, I'll have to cave in and do with what we have now.

I'll update the hardware-metrics semantic convention to reflect that. Thank you for the discussion!

jmacd commented 2 years ago

There is a lengthy discussion about this topic in https://github.com/open-telemetry/opentelemetry-specification/pull/2518#discussion_r881241625. In that comment, I propose that OTel is better off having string-valued attributes and 0/1-valued UpDownCounters to convey state sets.

With more specification work, it should be possible to avoid transmitting zeros for inactive states. For example, could we define a a data point flag set on the 1-valued attribute set that marks prior values for the same attribute keys as stale?

pirgeo commented 2 years ago

I agree with @jmacd here, +1 for string-valued attributes on counters.

bogdandrutu commented 2 years ago

@jmacd if you think that is the right call, let's create a PR to add this to the specification, also we may need to update the openmetrics compatibility (which seem to suggest already what you proposed https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/datamodel.md#stateset)

julealgon commented 1 year ago

We've ran into this issue ourselves and I actually have a question here to the folks in the thread on how to properly map a 4-state enum into a counter.

Consider a "worker" abstraction that executes "tasks". We want to keep track of how many tasks there are and be able to see how many exist in each state, as well as how many there are for a given "task name".

Say there are 4 states:

We'd have a Counter instance. When the task is first created, we'd push the following measurements:

This would "signal" that we now have one (new) task, called "task1", with the status Waiting.

My question now is what happens with state transitions... say we now want to move that task, from Waiting to Processing. If I just followed what I saw in this thread, the updates would be something like:

"1 for Processing, 0 for all others".

However, this is not making much sense to me. If I report transitions this way, wouldn't I still have "1 task1 instance in Waiting state", from the previous measurement, when summing the counter values?

Say, at this exact point in time, I want to query: "how many tasks of type "task1" in Waiting state are there?" Wouldn't this still return 1 (1 + 0 from the 2 measurements)?

Wouldn't it make more sense to represent that "a task left Waiting, and moved into Processing"?

By decrementing with the Waiting tag, that would "cancel out" the previous measurement, resulting in "0 task1 instances in Waiting state".

This would of course require the use of a UpDownCounter instead of a standard Counter even though, technically, my tasks would only increase.

Am I missing something obvious? It would make sense to me if the 0 cleared the counter, but that doesn't happen right? Or does it? At least in the .NET API, there is no such thing as "clear" operation for the counter: you can only "add" values. Does "adding 0" have any special semantic to signal that that counter needs to be cleared now or something of that sort? The only way reporting 0's would work in my mind is if tools treated a 0 reading as a reset of sorts.

sirianni commented 10 months ago

Rephrasing my comment https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/27617#issuecomment-1794928067 here

One other example on why the "encoding enum as numeric value" formulation is problematic... Many metrics backends will store pre-aggregated time-based rollups to reduce storage and increase performance for queries over longer time intervals. The "enum-as-numeric-value" formulation completely breaks semantics for these rollups.

Consider these two scenarios of an hour's worth of k8s.pod.phase metrics where the status changes over the course of a 1 hour interval:

Formulation 1 - encoding phase as metric value:

k8s.pod.phase{pod="my-pod"} 1
... for 01:00 - 00:29
k8s.pod.phase{pod="my-pod"} 3
... for 01:30 - 00:59

Formulation 2 - encoding phase as dimension:

k8s.pod.phase{pod="my-pod", phase="Pending"} 1
k8s.pod.phase{pod="my-pod", phase="Running"} 0
k8s.pod.phase{pod="my-pod", phase="Succeeded"} 0
... for 01:00 - 00:29
k8s.pod.phase{pod="my-pod", phase="Pending"} 0
k8s.pod.phase{pod="my-pod", phase="Running"} 0
k8s.pod.phase{pod="my-pod", phase="Succeeded"} 1
... for 01:30 - 00:59
With Formulation 1, if the backend stores a pre-aggregated rollup for the 1 hour 01:00 - 02:00 the following value will be stored Metric Attributes Min Max Sum Count
k8s.pod.phase {pod="my-pod"} 0 2 120 60

At query time, most frontends will apply an average time aggregation, dividing Sum by Count resulting in a 2 value for the metric:

k8s.node.condition{node="my-pod"} 2.0

This gives the false impression that the pod was in 2 - Running state for the hour, when in fact it in the Pending and Succeeded states each for 30 minutes in that hour.

Contrast this with the suggested "value-as-attribute" patterh Metric Attributes Min Max Sum Count
k8s.pod.phase {node="my-pod", phase="Pending"} 0 0 30 60
k8s.pod.phase {node="my-pod", phase="Running"} 0 0 0 60
k8s.pod.phase {node="my-pod", phase="Succeeded} 0 0 30 60

With this encoding, the query on the rolled-up data with average aggregation returns the expected data

k8s.pod.phase{pod="my-pod", phase="Pending"} 0.5
k8s.pod.phase{pod="my-pod", phase="Running"} 0.0
k8s.pod.phase{pod="my-pod", phase="Succeeded"} 0.5
ringerc commented 7 months ago

@sirianni The trouble with using the value-as-attribute model is that it causes excessive numbers of time series to be unnecessarily maintained for what is really one datapoint. For 5 states, you're sending 4 unnecessary datapoints constantly, and usually repeating all their common attributes for each datapoint. This is a huge waste of bandwidth and storage especially if one state is the norm most of the time. For example if I want a metric for postgres pg_replication_slots.state I need 5 metrics for one state.

To avoid having to send all the useless 0s the metric emitter must know the scrape interval and keep track of recent states, so it can send 0-valued tombstone datapoints for recent-past states for at least a couple of scrape intervals or until the expiry of the target tsdb's staleness threshold. That's ugly to push down onto every metric emitter.

What is IMO needed here is the concept of a non-cardinal attribute - a data-only attribute that is ignored for purposes of determining time-series uniqueness. So in your example above, if phase was a non-cardinal attribute, you would only have to send something like:

k8s.pod.phase{pod="my-pod", phase="Pending",noncardinal="phase"} 1
... for 01:00 - 00:29
k8s.pod.phase{pod="my-pod", phase="Succeeded",noncardinal="phase"} 1
... for 01:30 - 00:59

and the tsdb would know that all k8s.pod.phase{pod="my-pod",phase~=".*"} are 0 except the specified datapoint, so they can be given tombstone datapoints and made immediately stale. Or if the tsdb supports it, it could translate this into storing a changing string attribute associated with the datapoint. Either way, the tsdb knows the prior state so it is in a sensible position to apply the new state transition, without requiring the sender to constantly emit a broad set of negative signals.

This is a bit like a database "covering index" where additional attributes are encoded in the index storage, but not usable for search criteria. Either this, or true string-valued datapoints are IMO badly needed. It's amazing how difficult it is to represent simple enumerations of states in metrics.

trask commented 1 hour ago

possibly related https://github.com/open-telemetry/semantic-conventions/blob/main/docs/hardware/common.md#metric-hwstatus:

hw.status is currently specified as an UpDownCounter but would ideally be represented using a StateSet as defined in OpenMetrics. This semantic convention will be updated once StateSet is specified in OpenTelemetry. This planned change is not expected to have any consequence on the way users query their timeseries backend to retrieve the values of hw.status over time.

ArthurSens commented 40 minutes ago

Even though StateSet is defined in OpenMetrics, none of the Prometheus SDKs (at least that I'm aware of) implements it. Prometheus itself is not capable of parsing StateSet, even if content-type negotiation results in OpenMetrics format.

If I had to guess a reason for this, it's because there's little motivation for it besides saying "We implement all OpenMetrics features". In Prometheus, Gauges can easily represent what StateSets are meant for and they are represented similarly to this comment: https://github.com/open-telemetry/opentelemetry-specification/issues/1711#issuecomment-1117349513

I can definitely agree it's not the most efficient way to handle this use case, Prometheus is open for discussions https://github.com/prometheus/prometheus/issues/9139 :)