open-telemetry / opentelemetry-specification

Specifications for OpenTelemetry
https://opentelemetry.io
Apache License 2.0
3.69k stars 884 forks source link

Attributes to identify services that are using remote sampling #3026

Open MaheshGPai opened 1 year ago

MaheshGPai commented 1 year ago

What are you trying to achieve? Currently there is no way for the tracing backend to identify all the services that are using remote sampling policy - based on Jaeger remote sampling. Nor is it possible to identify what kind of sampling was applied by the client/root service. So if we have to implement an adaptive sampling policy generation at the tracing back-end, one has to monitor root span traffic pattern for all services & root operations and generate the sampling policy for all such combinations. When there is huge number of services & root operations this can be a problematic. And even if we end up monitoring all the services, the backed cannot effectively generate a policy since it is not aware how many of the traces were sampled and at what ratio.

What did you expect to see? Update the specification to add attributes to the span when a sampler decides to sample a trace. In case of opentracing, below attributes are added by the samplers:

Proposal is to add these in opentelemetry as well. In Jaeger, adaptive sampling was was achieved since the Probabilistic sampler used to add sampler.type & sampler.param attribute/tag - which is missing in all the samplers provided in the Opentelemetry SDK. Currently I believe only RateLimitingSampler (from jaeger-remote-sampler sdk-extenstion) does that - which I'm not clear why this was done only for this specific sampler.

Additional context.

Add any other context about the problem here. If you followed an existing documentation, please share the link to it.

yurishkuro commented 1 year ago

I can add a bit more color to this.

  1. Having this information is just exceedingly helpful in debugging. We recently started getting a lot of question in Jaeger about "why wasn't remote sampling strategy applied", but we can't tell from the trace what strategy was used to begin with.
  2. When using Jaeger's adaptive sampling (the backend observes how many traces arrive for a particular endpoint and adjusts the sampling probability for that endpoint to meet a predefined rate), the backend needs to know if any given trace was sampled with the adaptive policy or with some other non-probabilistic mechanism. The latter are excluded from the calculations.
kokikathir commented 1 year ago

It would be better if you add the same for opentelemetry-go as well.. Without that, we wont be able to know the sampling rate that got applied