open-telemetry / oteps

OpenTelemetry Enhancement Proposals
https://opentelemetry.io
Apache License 2.0
337 stars 164 forks source link

Probability sampling: Encode Span's head-adjusted count #170

Closed jmacd closed 3 years ago

jmacd commented 3 years ago

This text is reproduced from #148, a review that became too long after several revisions to the document.

This proposes specification text for a sampler.name and sampler.adjusted_count attribute, used to convey sampling probability in recorded SpanData messages.

A companion OTEP #168 discusses how to propagate sampling probability when using the Parent sampler.

jmacd commented 3 years ago

@oertl @yurishkuro @paulosman this document is unchanged from the most recent discussion in #148. I thought that opening a new PR would improve our chances of getting this specification merged.

This specification includes a lot of background to establish that there are many ways to sample and count spans, but all of them involve recording an adjusted count.

paulosman commented 3 years ago

@oertl @yurishkuro @paulosman this document is unchanged from the most recent discussion in #148. I thought that opening a new PR would improve our chances of getting this specification merged.

This specification includes a lot of background to establish that there are many ways to sample and count spans, but all of them involve recording an adjusted count.

Thank you. I can at least provide a vendor perspective, as a consumer of sampled telemetry events.

Adding an adjusted count solves a real problem for us: At the moment we depend on the return value from ShouldSample including attributes that are added to the span. We have custom plugins that add a sample rate to this list of attributes, which is then read at ingest and used to make estimates about the population. If no adjusted count is included, we assume a value of 1.

This works but requires customers to use custom sampling plugins that provide no real value above the ability to communicate the adjusted count (the plugins use a deterministic algorithm that is similar to the TraceIdRatioBased sampler). Having the adjusted count be part of the spec would allow us to get rid of the need for these plugins.

For tail sampling, we accept OTLP data to our sampling proxy and then convert the data to our HTTP API which includes an adjusted count. We'd prefer to be able to do end-to-end OTLP, which the spec change described in this OTEP would allow us to do.

So :+1: from me. Thanks again for writing this up.

jmacd commented 3 years ago

@open-telemetry/specs-trace-approvers Please take a look.

I've completed a prototype of this proposal in https://github.com/open-telemetry/opentelemetry-go/compare/main...jmacd:jmacd/propagate?expand=1. The text in this PR matches the prototype.

jmacd commented 3 years ago

There is an error condition implied by #168 that would occur when the random variable R is not received in the trace state, here.

This illustrates the benefits of deriving R from the bits of the TraceID. If we depend on tracestate to make a consistent decision and do not receive that field, the TraceIDRatio Sampler will not know how to make its sampling decision. We could fall back to whatever behavior was being implemented by each SDK before this specification, which might be labeled ProbabilitySampler. This would lead to a situation where non-root spans are sampled with known adjusted count, but without having been selected using a consistent decision.

As specified, @oertl, the sampler.adjusted_count attribute can be correctly unbiased without requiring the sampling decision to be made in a consistent way. If an arbitrary Sampler records adjusted counts in this way, how much do you care to know that another sampler was used? None of the built-in samplers would do this, except for a hypothetical TraceIDRatio Sampler when the random variable R is not received in the trace state. We can avoid any built-in Samplers doing this by adding requirements to the TraceID, probably via W3C traceparent.

yurishkuro commented 3 years ago

Suggestion: we may want to consider adding top-level Span fields for count and sampler name, given how important these concepts are and the space savings we'd achieve by recording values only.

jmacd commented 3 years ago

@yurishkuro thanks for the suggestion -- it raises a few questions. The value represented by a sampler.adjusted_count attribute could take either integer or floating-point values and the zero value is recognizable as distinct from an unset value. One way to preserve all of these features in the OTLP Span message is to represent the field as a opentelemetry.proto.common.v1.KeyValue. The protocol field could have the following text, for example.

  // Sampler adjustment is recorded by the Sampler that was in effect to produce this
  // span.  The following values are recognized:
  //
  //   If `sampler_adjustment.key` is non-empty, it is the sampler Name.  If it is
  //   one of the OpenTelemetry built-in Samplers, the following string is recognized:
  //   - "parent": the ParentBased sampler was used.  If this is set and value is missing,
  //     it means the adjusted counted is unknown.
  //   The `sampler_adjustment.key` value MAY be omitted when the adjusted count is known.
  //
  //   If `sampler_adjustment.value` is set to a non-negative numeric value, it is
  //   taken as the adjusted count of the Span being recorded.  In this case, the Span
  //   SHOULD have been selected using an unbiased technique, in which case the adjusted
  //   count conveys the effective representative count of the Span after sampling.
  //   - (int64) the Span has an integer adjusted count
  //   - (double) the Span has an floating-point adjusted count.
  //   - other Value types are unrecognized.
  //   Negative values are considered invalid.
  opentelemetry.proto.common.v1.KeyValue sampler_adjustment = 17;

How would you feel about this?

@oertl has made the suggestion that if we restrict adjusted counts to powers-of-two then we could transmit just a small integer to convey sampling. I worry this limits our ability to tail sample on the collection path in arbitrary ways, but this option could be combined with another alternative, such as to respect sampler.adjusted_count when present in order to support floating-point values. That would be a field something like:

  // Log adjusted count is the logarithm of adjusted count for this
  // span, with the following recognized values.
  //
  // 0: The zero value represents an adjusted of 1.  The zero value is
  //    equivalent to an unset field and SHOULD be omitted.
  //
  // 1..62: Values 1 through 62 represent an adjusted count of 2^(value)
  //    are equivalent to setting `sampler.adjusted_count`.
  //
  // 63: Value 63 (0x3f) represents an adjusted count of zero, equivalent
  //    to setting `sampler.adjusted_count=0`.
  //
  // 64: Value 64 (0x40) represents an unknown adjusted count, equivalent
  //    to setting any non-empty `sampler.name` to indicate an unknown sampler 
  //    (e.g., ParentBased).
  uint32 log_adjusted_count = 17;

I don't know of a tail-sampling algorithm that can be rate-limited and respect power-of-two sampling rates, although @oertl hinted at one in Thursday's Sampling SIG meeting.

yurishkuro commented 3 years ago

opentelemetry.proto.common.v1.KeyValue.

I would probably just define a new type with proper field names instead of overloading/reusing KeyValue. And since this is for out of band data collection, I am not sure power-of-2 values provide enough value.

oertl commented 3 years ago

@jmacd, @yurishkuro I think tail-based sampling should be considered completely independent of (consistent) span-level sampling. I propose to keep the adjusted counts due to span-level sampling and due to tail-based sampling completely apart. The span sampling rate could be limited to powers of two, while tail-based sampling could use any sampling rate. I guess it is much easier for vendors to control tail-based sampling anyway as this happens at the collector compared to span-level sampling??? If this assumption is true, I think it makes sense to allow vendors to choose any tail-based sampling rate they want.

jmacd commented 3 years ago

@yurishkuro @oertl Taking both of these pieces of feedback together yields something like:

  message SamplerDetail {
    // Log head adjusted count is the logarithm of adjusted count for this
    // span as calculated at the head, with the following recognized values.
    //
    // 0: The zero value represents an adjusted of 1.  The zero value is
    //    equivalent to an unset field and SHOULD be omitted.
    //
    // 1..62: Values 1 through 62 represent an adjusted count of 2^(value)
    //
    // 63: Value 63 (0x3f) represents an adjusted count of zero.
    //
    // 64: Value 64 (0x40) represents an unknown adjusted count.
    //
    // Values greater than 64 or less than zero are unrecognized.
    uint32 log_head_adjusted_count = 1;

    // Final adjusted count is the effective adjusted count of this
    // Span after tail sampling.  When this field is set to a non-zero
    // value, it SHOULD be taken as an accurate means of counting the
    // span as opposed to the inferring its count from the
    // log_head_adjusted_count field.
    float final_adjusted_count = 2;

    // Final sampler name optionally describes the most recent Sampler
    // that was used in selecting a Span.  This field SHOULD be set
    // when the adjusted count implied by log_head_adjusted_count will
    // result in an accurate count of spans.  When the final sampler
    // name is set, it means the log_head_adjusted_count field SHOULD
    // NOT be considered accurate for counting the span population
    // because further sampling has been applied.
    string final_sampler_name = 3;
  }

  SamplerDetail sampler_detail;

The only capability discussed above that is lost in this encoding is the ability for a tail sampler to express a zero adjusted count from a tail sampler as distinct from an unknown adjusted count. Since we're talking about custom logic in a tail sampler anyway, I'm comfortable not specifying any difference between unknown and zero.

The way this is written, in a closed OpenTelemetry system the built-in Samplers will all result in Spans with log_head_adjusted_count. Tail samplers can be written to output arbitrary or unknown adjusted counts according to the specification already drafted here, meaning they should set their name and not their adjusted count if they are not probabilistic in nature.

jmacd commented 3 years ago

I will update this PR based on the discussion in Thursday's Sampling SIG meeting, where we concluded that Tail sampling is interfering with our objectives and can be dropped at this time. Any kind of tail sampling that wants to be involved in span-to-metrics will require further specification, probably can be based off the proposals that were developed in this OTEP.

jmacd commented 3 years ago

I revised this OTEP based on the Thursday Sampling SIG discussion, in which it was proposed that we introduce just a single integer field to encode head sampling probability, ignoring (for the time being) the topic of tail sampling. The proposed field would take 65 possible values, as follows:

Value Head Adjusted Count
0 Unknown
1 1
2 2
3 4
4 8
5 16
6 32
... ...
X 2^(X-1)
... ...
63 2^62
64 0

and the SpanData would be updated with a new field in v1/trace.proto.

  // Log-head-adjusted count is the logarithm of adjusted count for
  // this span as calculated at the head, offset by +1, with the
  // following recognized values.
  //
  // 0: The zero value represents an UNKNOWN adjusted count.
  //    Consumers of these Spans cannot cannot compute span metrics.
  //
  // 1: An adjusted count of 1.
  // 
  // 2-63: Values 2 through 63 represent an adjusted count of 2^(Value-1)
  //
  // 64: Value 64 represents an adjusted count of zero.
  //
  // Values greater than 64 are unrecognized.
  uint32 log_head_adjusted_count = <next_tag>;
oertl commented 3 years ago

@jmacd, maybe it is better to sacrify the sampling rate 1/2^62 (which will never be used in practice I guess) in order to fit the state into 6 bits?

jmacd commented 3 years ago

@jmacd, maybe it is better to sacrifice the sampling rate 1/2^62 (which will never be used in practice I guess) in order to fit the state into 6 bits?

I agree, and have applied this change in both OTEPs.

jmacd commented 3 years ago

@yurishkuro I responded to your commend above. This PR is reduced to proposing to a single uint32 field in the protocol, defined as:

  // Log-head-adjusted count is the logarithm of adjusted count for
  // this span as calculated at the head, offset by +1, with the
  // following recognized values.
  //
  // 0: The zero value represents an UNKNOWN adjusted count.
  //    Consumers of these Spans cannot cannot compute span metrics.
  //
  // 1: An adjusted count of 1.
  // 
  // 2-62: Values 2 through 62 represent an adjusted count of 2^(Value-1)
  //
  // 63: Value 63 represents an adjusted count of zero.
  //
  // Values greater than 64 are unrecognized.
  uint32 log_head_adjusted_count = <next_tag>;

This matches OTEP #168, which is also ready for another look, please.

oertl commented 3 years ago

@jmacd, @yurishkuro I have implemented an abstract Sampler base class prototype in Java, see here. Any derived implementation will meet the requirements for consistent sampling. It takes care of generating the geometric random value and propagates it together with the exponent of the sampling probability. There are currently two prototype implementations: ConsistentParentRateSampler and ConsistentFixedRateSampler.

jmacd commented 3 years ago

To make this more actionable, I've added a section describing a new SamplingResult which is same as the new Span field proposed here. 4e6c69a

This is also the same as @oertl's prototype:

https://github.com/dynatrace-research/opentelemetry-sampling-poc/blob/087741b7bf4a33353ba7d4dccac4c99d82b10de7/poc/src/main/java/com/dynatrace/research/otelsampling/sampling/AbstractConsistentSampler.java#L166

jmacd commented 3 years ago

The only remaining question here, possibly, is whether the Span field name should be log_head_adjusted_count or something similar. I propose that we merge this OTEP. 😀