honeycombio / refinery

Refinery is a trace-aware tail-based sampling proxy. It examines whole traces and intelligently applies sampling decisions (whether to keep or discard) to each trace.
Apache License 2.0
301 stars 92 forks source link

How should we handle linked traces? #516

Open cartermp opened 2 years ago

cartermp commented 2 years ago

Today, you can end up in an unfortunate scenario where:

This came up in a discussion with a prospect who was just doing deterministic head sampling, and it makes sense that they'd see this with head sampling, but I realized we don't have any special handling of links in refinery either.

One thing I considered is:

However, that could only work if the linked trace arrived after the trace that it's linked from. And OTel doesn't give you a way to tell if a given trace is being linked from - you can only tell which trace is being linked to. So this can still result in a broken experience like described above.

Is there another way to approach this, or perhaps with some clever configuration?

kentquirk commented 2 years ago

The problem with the solution you sketched out is a scenario like this:

Someone has a web request that shoves a message into Kafka for handling by a consumer. They turn the web request into a trace and end the trace when the request responds with the "message received".

Their consumer is given the trace ID of the initial request, but doesn't use it as a parent ID, because they don't need the entire thing as one big trace. When processing begins, they start a new trace and add a link to the original trace.

The result would be that every trace in the system gets sampled.

One idea might be that we keep a longer-lived (maybe redis-based) record of traces sampled, and another record of traces referenced by links in sampled traces.

The sampling decision for a trace then becomes:

Does this sound plausible?

cartermp commented 2 years ago

I think so, but couldn't this still result in the same scenario described, where a trace gets sampled but a trace that it links to isn't sampled? Or is the thinking that it would make that scenario less common?

kentquirk commented 2 years ago

If both the linking and linked traces occur near each other in time, this would basically ensure that the sampling decisions are largely consistent between them.

If you use a definitive cache, the only time they'd disagree is if one of them fell out of the cache; a central cache stored in Redis with a capacity of millions of trace IDs would mean that hours or days would have to pass before you could make an incorrect decision. For large installations, though, you might have a pretty big cache to get adequate storage.

Another option would be to use a ~HyperLogLog~ Bloom Filter model which would mean that a small number of links could make inaccurate decisions but that the cache could be orders of magnitude smaller (fast and close to right).

kentquirk commented 2 years ago

I'm embarrassed to say that I had a thinko there -- I don't mean HyperLogLog, but some variant of a Bloom filter.

spencerwilson commented 2 years ago

Framing: Consider spans A, B1, B2, etc.; with A earlier than the others (in terms of creation timestamp). The B spans link to span A, and may be created arbitrarily later in time.

There's an analogy to be made like "child spans : their parent span :: linking spans : the spans they link to". Both are one-way references to other spans, and it's nice when references are valid. Adapting trace completeness's description: "for any ~non-root span~ span that links to other spans, the ~trace~ saga (not a standardized term in OTel) is definitely incomplete if ~the span's parent span~ any of the linked spans was not collected." See also some relevant terminology: long-lived transaction, saga, session.

@cartermp is saying e.g. "Sampling A is a prerequisite to sampling B1", or equivalently "B1 is sampled => A is sampled".

I'd modify @kentquirk's proposed algorithm slightly:

  1. If this span ID is in the set of referenced IDs, sample.
    1. This addresses @cartermp's issue.
    2. Necessary since spans are linked, not traces. This unfortunately increases the cardinality of the set of tracked IDs by potentially several orders of magnitude (I've seen traces with 1,000 spans in them, e.g.). I'm not sure if that's a fatal blow to the set/Bloom filter idea.
  2. If this span links to spans known to have been sampled, sample.
    1. This is an orthogonal concern, but potentially nice to have. Just as ParentBased Samplers promote complete traces, this behavior would promote complete sagas.
    2. A weaker version of this step—"If this span links to spans known to have possibly been sampled, sample"—could be evaluated in SDKs by a Sampler, assuming that all of the SpanLinks to be added to a new span are known at span creation time. How? Recall that within the data characterizing a SpanLink is the sampled trace flag of the span that's the link target. The added "possibly" is a consequence of the W3C Trace Context spec, which defines sampled as having a Bloom filter-like character: the flag being set denotes "maybe sampled"; unset denotes "not sampled".
  3. Otherwise, make a normal sampling decision.
    1. Folks raised the possibility of an SDK Sampler potentially using the r-value present on a given SpanLink, to propagate a basis for consistent probability sampling through a tree of linked spans. TBD what such a Sampler should do if given multiple SpanLinks, each bearing an r-value. (In other words, since a span can link to many others, and since links can only be added during span creation, linked spans can, in the most general case, form a DAG, not just a tree.)
kentquirk commented 1 year ago

@spencerwilson -- this came up again today in Pollinators.

You're correct that span links refer to spans, not traces, but a span link includes its traceID. In a world where entire traces are either sampled or not (which is the current state of Refinery) we don't need to track the individual spans. We just need to know that the trace was sampled or not. This, I think, means that the Bloom filter approach is much more tractable. Am I missing something?

mterhar commented 1 year ago

If a refinery node was given an endpoint to say "you decide on this one yet, bro?" and we use the tracetimeout in the configuration to ensure the linked trace isn't decided upon before the parent trace (rather than sending on root), we should be able to get a pretty high certainty that most linked traces will make it through if their directly-linked preceding trace was sampled.

Assumptions like:

  1. if root span has a link, do this instead of the usual. if it's another span within the trace, probably not gonna help
  2. Ignore root span, wait for the timeout
  3. at timeout-time, ask the other refinery node that got the preceding chunk of the saga if it was sampled or dropped. if it's on the same shard by change, no need for API call.
  4. if it was sampled, override the regular rules
  5. If it was not sampled, apply the rules

To implement this, we'd need a new endpoint that says "yes" if a trace_id is in the "recently-seen and sampled" buffer. Then a code path that handles the linked root spans. Also a configuration to start caring about it since it'll be a big functionality change. This configuration should probably be at the environment level so folks can decide based on API key/team relationship rather than whole-refinery or per-rule.