The concept of exemplars is very vaguely defined in the spec

From https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/datamodel.md#exemplars

An exemplar is a recorded value that associates OpenTelemetry context to a metric event within a Metric.

"recorded value" of what? A time series is already a collection of recorded values, how is exemplar related to that?

One use case is to allow users to link Trace signals w/ Metrics.

What does that mean? Why does one need that? What problem is it trying to solve? Is it the only way to solve that problem?

The recorded value (value)

Recorded values are being captured by the metric / time series. What is the other type of recorded value this refers to?

From https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#exemplar

An Exemplar is a recorded Measurement that exposes the following pieces of information:

What is the relationship between this Measurement and the metric itself? When an end-user of OTel API is calling an instrument to record a metric's value, are they also providing some additional measurement that is the exemplar? The API spec never mentions anything about exemplars.

A Metric SDK MUST provide a mechanism to sample Exemplars from measurements.

The line above just said exemplar IS a measurement. Why is it needed to be sampled? Assuming this is explained, what is this "mechanism to sample", how/where is it defined?

A bit of a response, out of order.

One use case is to allow users to link Trace signals w/ Metrics.

What does that mean? Why does one need that? What problem is it trying to solve? Is it the only way to solve that problem?

This was covered in the OTEP. Generally we don't include "why" in the data model spec, although it's been leaking in it. DataModel specification focuses on "WHAT", the "why" should be in other sections. The original OTEP outlines the motivaton.

However, I'll mention that exemplars, as a means of identifying traces-which-correlate with specific histogram buckets is the primary use case. The idea here is you start from an alert-based-on-metric and help the user walk into traces/logs via the histogram buckets when they want to do deeper analysis (and the data is available).

The recorded value (value)

Recorded values are being captured by the metric / time series. What is the other type of recorded value this refers to?

Not quite true. Metric time series are aggregated values. Individual points are lost. Even for a Gauge the "last sampled value" is reported. Additionally users commonly shuffle attributes in metrics to deal with cardinality limitations in backend storage. Exemplars are the only access to a raw measurements (the attributes + value + associated span/trace).

An Exemplar is a recorded Measurement that exposes the following pieces of information:

What is the relationship between this Measurement and the metric itself? When an end-user of OTel API is calling an instrument to record a metric's value, are they also providing some additional measurement that is the exemplar? The API spec never mentions anything about exemplars.

A Measurement is used to compute a metric point. When an end-use of OTEL API records a value to an INstrument that's a "measurement". When we report metric streams we get "aggregated values", the measurements are effectively "compressed" in process. And yes, exemplars don't show up in the API as they're just normal measurements. The SDK is where they need to be called out, as the SDK allows attribute-munging, aggregation and other compression techniques on data.

From https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/datamodel.md#exemplars

An exemplar is a recorded value that associates OpenTelemetry context to a metric event within a Metric.

"recorded value" of what? A time series is already a collection of recorded values, how is exemplar related to that?

A time series is aggregated values from raw measurements.

A Metric SDK MUST provide a mechanism to sample Exemplars from measurements.

The line above just said exemplar IS a measurement. Why is it needed to be sampled? Assuming this is explained, what is this "mechanism to sample", how/where is it defined?

This is defined as ExemplarReservoir and ExemplarFilter interface where you can bound memory usage for exemplar sampling and easily turn it off, sample with active traces (default) or turn it on for all measurements. It follow below those lines.

Trying to turn this discussion into actionable things, a few questions:

Is the notion that metrics are streams of aggregated point values not clear in the specification? Does that belong in the Exemplar specification or should it be made more clear elsewhere?
Should raw statements like "SDK MUST provide means to sample exemplars" be removed, favoring just specifying the underlying mechanisms?
Do we need to include more "why" in our specification?

This was covered in the OTEP. Generally we don't include "why" in the data model spec, although it's been leaking in it. DataModel specification focuses on "WHAT", the "why" should be in other sections. The original OTEP outlines the motivaton.

The OTEP has a much better (certainly a couple sentences longer) definition of the exemplar, why couldn't it be copied into the spec? And at minimum the spec should link to the OTEP if OTEP provided additional context & motivation, how else are people supposed to find that information when reading the spec?

Is the notion that metrics are streams of aggregated point values not clear in the specification.

Most aggregations, especially for the most basic measurements like latencies and counts, happen below the metrics API surface. As an end user of the API I am providing the raw measurement, with dimensions that may later be collapsed via aggregation. The exemplar description make it sound like an exemplar is some other measurement. What they are are full fidelity samples of the raw measurements that survive aggregations.

Should raw statements like "SDK MUST provide means to sample exemplars" be removed

I don't think it has to be removed it, the spec just needs to document those "means". Otherwise what's the point of the spec if each language SDK invents its own "means of sampling exemplars".

Do we need to include more "why" in our specification?

I think I would like this structure in the spec:

explain the business problem
explain the chosen solution
link to OTEP that explains why the solution was chosen and hopefully where alternatives were considered / discussed

Technically, if the OTEP is well written (i.e. covers similar points), then the spec can collapse the above just by referring to the OTEP. But I still think it's a suboptimal solution because OTEPs are not meant to be "living documents", whereas the Spec is, so if someone comes up with better wording or explanation of something, the Spec is the place where that change can be applied.

This OTEP, in particular, did not discuss any alternatives to exemplars. At my company we're working on a framework that introduces semantic structure into all telemetry. This would allow the backend to know that, for example, a specific metric like latency or error count corresponds to specific attributes captured in the span. If those attributes are indexed by the backend, then I can reproduce "exemplars" at query time just by searching for appropriate traces. This approach has a dependency on semantic knowledge (a downside), but on the upside it does not require any exemplar-support in the collection pipeline / backend and provides better fidelity of the results (since I can find all traces in the latency bucket, not just the N exemplars captured at ingestion).

Thanks for this list, so given the above, I'll do the following to the spec as written:

Improve description of exemplar, based on OTEP with a better notion of WHY instead of WHAT.
Regarding "measurement" as a term, it's heavily used in the SDK spec (where this lives), so I'd like to keep using it. I'll take a crack at making sure it's clear what that means to an SDK author. I'll clarify that measurement is the values the user originally provides (plus aspects of the Context in which they happened).

Regarding alternatives to Exemplars,It sounds like your company has a solution for tying span-latency (or span-derived metrics) back to histograms in a different fashion. Would love to see that documented and public as an alternative to consider. The feature your company has sounds powerful and useful, and would be worth promoting in OSS.

As Exemplars stand right now, they can tie trace/spans to non-span-derived-metrics. I think there's likely merit to supporting both techniques. Given Prometheus investment into exemplars, and the importance of that in the OSS metric ecosystem, I still think we should include first-class support for this signal with easy disable/enablement.

open-telemetry / opentelemetry-specification

The concept of exemplars is very vaguely defined in the spec #2155