Open cartermp opened 2 years ago
The problem with the solution you sketched out is a scenario like this:
Someone has a web request that shoves a message into Kafka for handling by a consumer. They turn the web request into a trace and end the trace when the request responds with the "message received".
Their consumer is given the trace ID of the initial request, but doesn't use it as a parent ID, because they don't need the entire thing as one big trace. When processing begins, they start a new trace and add a link to the original trace.
The result would be that every trace in the system gets sampled.
One idea might be that we keep a longer-lived (maybe redis-based) record of traces sampled, and another record of traces referenced by links in sampled traces.
The sampling decision for a trace then becomes:
Does this sound plausible?
I think so, but couldn't this still result in the same scenario described, where a trace gets sampled but a trace that it links to isn't sampled? Or is the thinking that it would make that scenario less common?
If both the linking and linked traces occur near each other in time, this would basically ensure that the sampling decisions are largely consistent between them.
If you use a definitive cache, the only time they'd disagree is if one of them fell out of the cache; a central cache stored in Redis with a capacity of millions of trace IDs would mean that hours or days would have to pass before you could make an incorrect decision. For large installations, though, you might have a pretty big cache to get adequate storage.
Another option would be to use a ~HyperLogLog~ Bloom Filter model which would mean that a small number of links could make inaccurate decisions but that the cache could be orders of magnitude smaller (fast and close to right).
I'm embarrassed to say that I had a thinko there -- I don't mean HyperLogLog, but some variant of a Bloom filter.
Framing: Consider spans A, B1, B2, etc.; with A earlier than the others (in terms of creation timestamp). The B spans link to span A, and may be created arbitrarily later in time.
There's an analogy to be made like "child spans : their parent span :: linking spans : the spans they link to". Both are one-way references to other spans, and it's nice when references are valid. Adapting trace completeness's description: "for any ~non-root span~ span that links to other spans, the ~trace~ saga (not a standardized term in OTel) is definitely incomplete if ~the span's parent span~ any of the linked spans was not collected." See also some relevant terminology: long-lived transaction, saga, session.
@cartermp is saying e.g. "Sampling A is a prerequisite to sampling B1", or equivalently "B1 is sampled => A is sampled".
I'd modify @kentquirk's proposed algorithm slightly:
ParentBased
Samplers promote complete traces, this behavior would promote complete sagas.Sampler
, assuming that all of the SpanLinks
to be added to a new span are known at span creation time. How? Recall that within the data characterizing a SpanLink
is the sampled
trace flag of the span that's the link target. The added "possibly" is a consequence of the W3C Trace Context spec, which defines sampled
as having a Bloom filter-like character: the flag being set denotes "maybe sampled"; unset denotes "not sampled".Sampler
potentially using the r-value present on a given SpanLink
, to propagate a basis for consistent probability sampling through a tree of linked spans. TBD what such a Sampler
should do if given multiple SpanLinks
, each bearing an r-value. (In other words, since a span can link to many others, and since links can only be added during span creation, linked spans can, in the most general case, form a DAG, not just a tree.)@spencerwilson -- this came up again today in Pollinators.
You're correct that span links refer to spans, not traces, but a span link includes its traceID. In a world where entire traces are either sampled or not (which is the current state of Refinery) we don't need to track the individual spans. We just need to know that the trace was sampled or not. This, I think, means that the Bloom filter approach is much more tractable. Am I missing something?
If a refinery node was given an endpoint to say "you decide on this one yet, bro?" and we use the tracetimeout in the configuration to ensure the linked trace isn't decided upon before the parent trace (rather than sending on root), we should be able to get a pretty high certainty that most linked traces will make it through if their directly-linked preceding trace was sampled.
Assumptions like:
To implement this, we'd need a new endpoint that says "yes" if a trace_id is in the "recently-seen and sampled" buffer. Then a code path that handles the linked root spans. Also a configuration to start caring about it since it'll be a big functionality change. This configuration should probably be at the environment level so folks can decide based on API key/team relationship rather than whole-refinery or per-rule.
Today, you can end up in an unfortunate scenario where:
This came up in a discussion with a prospect who was just doing deterministic head sampling, and it makes sense that they'd see this with head sampling, but I realized we don't have any special handling of links in refinery either.
One thing I considered is:
However, that could only work if the linked trace arrived after the trace that it's linked from. And OTel doesn't give you a way to tell if a given trace is being linked from - you can only tell which trace is being linked to. So this can still result in a broken experience like described above.
Is there another way to approach this, or perhaps with some clever configuration?