Open yurishkuro opened 4 years ago
Thanks I will transform this in individual issues to be addressed by the OpenTelemetry implementations.
One quick question from what I read: what is the interaction between baggages and traces?
There isn't much interaction. We usually log the baggage to the span when it is set, but aside from that baggage is a runtime thing.
One missing functionality is the ability to configure the client via environmental variables https://www.jaegertracing.io/docs/1.13/client-features/. This was proven to be useful for cloud/containerized deployments.
@yurishkuro for the:
Baggage restrictions
This one is a bit iffy in terms of usefulness, but Jaeger clients also support remotely configurable way to restrict which services can set which baggage keys, as well as key/value lengths, etc.
Does the length apply to incoming baggage key/value or only the once set by the process? For the baggage keys configuration what is the behavior if the code tries to add a key that is not allowed by the config?
@pavolloffay the configuration can live in the Jaeger "exporter". I think we use a wrong name for exporter maybe rename it to Jaeger "client".
The idea is that we have the SDK that allows:
Then the Jaeger "client" will depend on the SDK and:
@bogdandrutu I have also created a generic issue for SDK configuration https://github.com/open-telemetry/opentelemetry-specification/issues/232. Some configuration is indeed jaeger specific, however some properties apply to the whole SDK: specify the resource (service name...), reporter to use, propagation...
Starts a new timer that every 60 seconds reads the sampling config from the Jaeger backend, and if anything changes changes the SDK trace config;
That would be nice, we didn't have config watchers in jaeger.
@bogdandrutu
Does the length apply to incoming baggage key/value or only the once set by the process?
We've only implemented restrictions of baggage items set by the process, at the time they are set. No restrictions on propagated baggage.
For the baggage keys configuration what is the behavior if the code tries to add a key that is not allowed by the config?
It is not set, and a log entry is added to the span (log entry is added in all cases, btw).
Due to the age of this issue, the GC is interested how many of these topics are still relevant, what your current requirements are (if they have changed), and if this issue could be split into smaller issues.
Remote sampling configuration is still important, others are nice to have but I did not hear much demand for them, including from Uber folks who migrated to OTEL SDKs (cc @vprithvi).
Remote sampling configuration is still important, others are nice to have but I did not hear much demand for them, including from Uber folks who migrated to OTEL SDKs (cc @vprithvi).
Would the remote sampling ask be handled by OpAmp support in SDKs/collector?
It would be great if in the future we could decommission Jaeger client libraries, which take non-trivial effort to support in all languages, and replace them with OpenTelemetry SDKs. @bogdandrutu asked me to enumerate additional features supported by Jaeger clients that are not currently supported by OpenTelemetry, to inform future roadmap after v1.
Remotely configurable sampling
Jaeger clients usually consult Jaeger backend for the sampling strategies to use. This is implemented as a polling clients -> agent -> collector, usually once a minute. The sampling can be statically configured on the backend, or automatically calculated to meet certain throughput goals. The sampling is controlled at the granularity of service + operation (aka span name), so that services (like API gateways) with endpoints that have vastly different QPS can sample different endpoints appropriately.
Firehose mode
Jaeger trace state contains a flag that indicates a firehose mode, in which traces are written to cheap storage and only accessible by trace ID, without indexing. This is useful when there are other upstream means of locating traces (e.g. trace ID is logged as part of an integration test), and allows higher throughput in the storage layer compared to fully indexed traces.
Setting debug flag
Jaeger trace state has a debug flag that tells the backend to try its best to sample the trace. For example, if the backend implements additional consistent downsampling (for capacity control), the traces with debug flag will avoid this downsampling.
From the API endpoint this is done by setting
sampling.priority=1
tag on the root span.In addition, the debug flag can be set by the user even before the trace is created, by including a special header
jaeger-debug-id: anything
. When Jaeger sees this header in the incoming request, it's equivalent to settingsampling.priority=1
andjaeger-debug-id=$value
tags on the span. Storing the header value as a correlation ID allows finding the trace later. E.g. I can send a curl request withjaeger-debug-id: yuri-test-1
.Setting baggage
Similar to debug flag, there is a header
jaeger-baggage: k=v,k=v
that can be set by user before the trace even exists.Baggage restrictions
This one is a bit iffy in terms of usefulness, but Jaeger clients also support remotely configurable way to restrict which services can set which baggage keys, as well as key/value lengths, etc.
Ad-hoc sampling policies
This is currently work in progress that I mentioned on the Sampling RFC. It's similar to Facebook's feature where users can centrally configure ad-hoc sampling policies to collect data exhibiting certain patterns, e.g a specific tag or a header or combination. Note that this is not after-the-fact sampling like "sample if there is an error or unusual latency", our ad-hoc sampling is still mostly upfront. The main reason I mention it, even though it doesn't exist yet in Jaeger, is because it requires certain changes to the Sampler API in the SDK so that it can take into account various pieces of the span data like tags, etc.