open-telemetry / opentelemetry-specification

Specifications for OpenTelemetry
https://opentelemetry.io
Apache License 2.0
3.76k stars 890 forks source link

Multi-policy sampling #2113

Open yurishkuro opened 3 years ago

yurishkuro commented 3 years ago

The recent OTEPs from @jmacd showed now a consistent sampling can be applied to multiple nodes using different sampling rates. I have another use case that is similar, but not the same. Our system supports sampling policies that could express not only the desired sampling rate, but also the desired verbosity, e.g. one policy may request the trace to collect CPU metrics, another may ask for function-level profiling (lots of internal spans). And those policies sometime can be attached to the same node in the architecture. Right now we're evaluating them independently (with independent coin flip).

This raises two problems:

Problem 1. The adjusted_count or "weight" is not a property of the whole span, but of a specific profiler. E.g. the same span can represent basic metrics like latency/QPS/errors sampled 1-in-10, and also CPU metrics sampled 1-in-100. So the weights for these two sets of data will be 10 and 100 respectively.

Problem 2. How do we use consistent sampling approach and assign the correct weights to each profiler? For example, assume we have two policies active at the same node that activate different profilers with different probabilities:

The weight for CPU profiler is unambiguous, w=100, but what about the weight for the basic profiler? Is is w=1/(0.1 + 0.01 + 0.1*0.01)?

@jmacd @oertl - curious to hear your thoughts

oertl commented 3 years ago

If you are restricting to power of two sampling rates, like p = 1/8 = 1/2**3 for policy A and p = 1/64 = 1/2**6 for policy B, the R-value can be used to decide on the activity of policies. If we activate policy A whenever R >= 3, it will be active with a probability of 1/8. Similarly, policy B is activated whenever R >= 6 to achieve a probability of 1/64.

The probability that some profiler is active is given by the maximum activation probability of any policy that contains the given profiler. Therefore, the basic profiler is activated with a probability of max(1/8, 1/64) = 1/8 and the CPU profiler is activated with a probability of 1/64. The reciprocals of those probabilities would simply be the corresponding adjusted counts that need to be applied as weights to the basic and CPU data, respectively.

yurishkuro commented 2 years ago

@oertl can I pick your brain on another somewhat similar scenario? Suppose I want to have different sampling rates per endpoint of a service. Since the endpoint partitions all possible requests into disjoint sets, there's no issue with using w=1/p for each trace using the respective p assigned to that endpoint. But what if the dimension in question could have multiple values for a given trace, e.g. a user-flow-id passed down from the app? I can assign different probabilities to each value of user-flow-id, but when a trace arrives with a set of those values, how do I compute the weight? I initially thought it could be handled similar to your example below with multiple policies (i.e. w=1/(max p_i)), but that doesn't seem to be correct if I later want to extrapolate some metric (e.g. request count) using the weight for just one of the values of that dimension. The only correct way I can think of is by considering the space partitioning by all possible permutations of values of this dimension, which would drastically increase the number of independent probabilities to keep track of.

oertl commented 2 years ago

@yurishkuro If I understand correctly, you are thinking of a case where the individual spans of a trace do not have the same sampling probability. In this case, the trace may be incomplete (or partially sampled) and a weight for the whole trace cannot simply be determined. The weighting of the trace ultimately depends on the quantity of interest extracted from the trace. This is exactly what my paper is about https://arxiv.org/abs/2107.07703. The nice thing about consistent sampling is that it avoids the combinatorial explosion you mentioned.

yurishkuro commented 2 years ago

@oertl sorry I didn't explain well. I was still thinking of a traditional sampling that happens in one place (at the root) and then propagated with the trace, so it is consistent. But I want the probability of that decision to be based on attributes of the trace. The classic example of this is different sampling rates by the endpoint. Or by user type: free user vs. premium user. Basically, when the values of the attribute are independent, there is no issue. But in case of a user-flow-id attribute, say it can take values catalog, checkout, and prime (like amazon prime). The attribute itself is a set<string> and checkout can appear standalone, or in combination with prime. The users of tracing want to only set 3 probabilities, one for each flow, not for all possible combinations of the flows. The system could take max probability when a combination of flows is present in the trace, but I don't think the math would then work out at query time e.g. select sum(weight) from trace_roots where user_flow=X will not give accurate extrapolated count of requests containing flow X.

oertl commented 2 years ago

@yurishkuro I do not see a problem when taking the max probability. As long as each trace is finally weighted according to the inverse of its chosen sampling probability the estimate should be unbiased.

Here an example for clarification:

Assume sampling probabilities p{catalog}=1/2, p{checkout}=1/4, and p_{prime}=1/8 and that following traces with corresponding attributes have been sampled:

The estimate of the number of traces with attributes {checkout} would be 2 + 4 + 4 = 10 The estimate of the number of traces with attribute {prime} would be 4 + 8 + 2 = 14 The estimate of the number of traces with attribute {prime, checkout} would be 4 The estimate of the number of traces with attribute {prime, catalog, checkout} would be 0 ...