honeycombio / husky

a library for translating other data types to Honeycomb-format data structures
Apache License 2.0
2 stars 10 forks source link

Support getting event sample rate from TraceState p-value #77

Open spencerwilson opened 2 years ago

spencerwilson commented 2 years ago

Is your feature request related to a problem? Please describe.

Honeycomb can't easily interoperate with OpenTelemetry systems that perform sampling. Specifically, in a multi-service OTel-first system, there's no convenient method or place to adapt OTel p-values into sampleRate attributes.

Describe the solution you'd like

Update api.honeycomb.io's OTLP/gRPC ingest to behave like so: For each span during TranslateTraceRequest,

  1. Deserialize a span's trace state, sourced from Span.GetTraceState. See details on OpenTelemetry's use of trace state here: https://github.com/open-telemetry/opentelemetry-specification/blob/v1.11.0/specification/trace/tracestate-handling.md
  2. If trace state defines a p-value in the interval [0, 62], set the Honeycomb event's SampleRate to 2 ** p. The OTel spec refers to this quantity as "adjusted count".
  3. If trace state defines a p-value equal to 63, drop the event.
  4. Otherwise, use the existing getSampleRate logic that consults span attributes.

Note: I'm unsure of the correctness of step (3). Am seeking clarification with the OTel Sampling SIG now: https://cloud-native.slack.com/archives/C027DS6GZD3/p1652294663462109

Describe alternatives you've considered

A processor in the OpenTelemetry Collector that copies/adapts p values into a sampleRate span attribute might be feasible, but no one's open-sourced such a processor yet.

Additional context

MikeGoldsmith commented 2 years ago

Hi @spencerwilson - thanks for raising this issue.

We agree adding support for OpenTelemetry sampling would be good to add. We'll look into what is required and prioritise accordingly.

spencerwilson commented 2 years ago

If trace state defines a p-value equal to 63, drop the event.

fwiw I confirmed that this is indeed the correct behavior, if you want estimated quantities (counts, etc.) to be valid. 63 would only be indicated if the span was collected due to a "non-probability sampler" choosing it, and such a sampler is a thing that kind of exists "outside of" the statistically quantifiable realm of sampling.

AFAIK it's a pretty theoretical concern at present. No one's advertised that they've built any non-probability sampler yet, so spans with p=63 will be virtually nonexistent in practice.

The best-best treatment of spans with p=63 would be to both:

I'm not sure if the Honeycomb query engine supports 0-weighted events, though.