open-telemetry / opentelemetry-specification

Specifications for OpenTelemetry
https://opentelemetry.io
Apache License 2.0
3.69k stars 884 forks source link

Leaky-bucket rate limiting sampler #1769

Open anuraaga opened 3 years ago

anuraaga commented 3 years ago

Many current tracing solutions provide a rate limiting sampler with a leaky bucket (reservoir). Such a sampler allows serving requests up to a fixed number per second (reservoir) and if that is passed enforces a rate limit on additional requests in the second. As requests die down, the reservoir can be filled again.

Java SDK currently has an implementation for Jaeger remote sampler support

https://github.com/open-telemetry/opentelemetry-java/blob/main/sdk-extensions/jaeger-remote-sampler/src/main/java/io/opentelemetry/sdk/extension/trace/jaeger/sampler/RateLimitingSampler.java

I think it's a good algorithm to have in general and relatively simple. Is it possible to add it to the spec as a possible sampler type?

oertl commented 3 years ago

I do not support adding this sampler. This sampler makes extrapolation difficult as the sampling probability cannot be determined. Furthermore, this "first come, first served" approach is not fair (which is in contrast to reservoir sampling). As example, consider the following sequence of events on the time axis

                       E1 E2                                       E3 E4                                       E5 E6
 ______________________|__|________________________________________|__|________________________________________|__|______

Obvioulsy, events E1, E3, E5 will have a higher probability to be sampled than E2, E4, E6 as the available balance (https://github.com/open-telemetry/opentelemetry-java/blob/4844b82a59c5172c6b1d7206a2b7a3a3ea5d748f/sdk/common/src/main/java/io/opentelemetry/sdk/internal/RateLimiter.java#L53) tends to be higher when making the sampling decision for E1, E3, E5.

carlosalberto commented 3 years ago

cc @jmacd

anuraaga commented 3 years ago

Here's the implementation from Brave. If we need some logic to distribute across the second, we could try to adapt it, I think improving the fairness is an implementation detail of a ratelimiting sampler

https://github.com/openzipkin/brave/blob/master/brave/src/main/java/brave/sampler/RateLimitingSampler.java#L120

For Zipkin users, we've found the ratelimiting sampler to be very popular. Often it's a rate limit that a tracing team may give you quota for, or you're paying by the trace, so mapping a ratelimiting sampler to these restrictions is extremely intuitive. In addition, it removes the need to target a random sampler to peak load, which can be an exercise in trial / error as well as then provide undersampling / missed errors during low load, traffic patterns changing for a service is very common especially for medium scale services.

I figured extrapolating the sampling rate could be done by keeping track of the effective sampling rate (sampled vs total) with some windowing for reasonable precision. Even if it's not possible, it seems like we should leave it up to the user whether they need such functionality or whether they need a rate limiting sampler - I think there are users that do need a rate limiting sampler and wouldn't mind loss in functionality they may not use such as converting spans into metrics, etc.

jmacd commented 3 years ago

FYI https://github.com/open-telemetry/oteps/pull/148 proposes the use of probabilistic samplers, and mentions two algorithms that offer rate-limited probability sampling. One is named "Priority Sampling", the other is named "Varopt". (Note! I am trying to make an open-source release of Varopt to help.)

oertl commented 3 years ago

If the sampling probability p is dynamically chosen as

p = min(1, time_delta * rate_limit),

where time_delta is the temporal distance between two subsequent spans and rate_limit defines the desired rate limit, the number of sampled spans will be limited while also knowing the sampling probability for every span. time_delta could also be an exponentially smoothed estimate of the average temporal distance of subsequent spans, which would make the sampling more fair but also less adaptive.

A different approach would be reservoir sampling or similar techniques that use a buffer. They have in common to collect sample candidates over a certain time period. At the end of each period they report all elements remaining in the buffer as samples and the buffer is reset. However, these kind of approaches are not compatible with the current Sampler interface which requires to make an immediate sampling decision.

anuraaga commented 3 years ago

Just checking in on if there's a way forward with this issue? Do we need to wait for Varopt to be able to add an initial RateLimitingSampler implementation? I feel it affects further work such as @pavolloffay's which is adapting Jaeger's remote sampling to OTel https://github.com/open-telemetry/oteps/pull/167/files#

jmacd commented 3 years ago

@anuraaga If we are considering specifying samplers that will be included by default in every SDK, I think both the Token Bucket (Leaky Bucket) and the Varopt approaches set the bar too high. These approaches are both needed only when combining spans produced by other samplers, so they feel like they should be optional.

If we are looking for a simple rate-limiting sampler, we could opt for https://en.wikipedia.org/wiki/Reservoir_sampling#Algorithm_R. This can be implemented in around 5 lines of code, the process is: (1) select a maximum rate, (2) choose a short buffering interval, (3) compute sample size, (4) apply simple reservoir sampling, (5) output the sample each period. These spans can be labeled with adjusted count attributes to convey the sampling rate, which would make me happy.

jmacd commented 3 years ago

The comment above leaves out an important point, and I don't want to mislead you. Reservoir sampling can't be used to make a head sampling decision, it can only make a tail sampling decision. If a uniform head sampling decision is made and then passed to a simple reservoir tail sampler, we can have rate limiting and probabilistic span counting, but we will also have trace incompleteness because there will be traces that start but are not recorded because of rate limits. This may lead into a discussion about how we ever know if a trace is complete, which is not limited to reservoir sampling and tail sampling.

I don't believe there is a rate-limited head sampling technique that is also unbiased and can record adjusted counts correctly. Speaking for Lightstep, we would prefer to see probabilistic rate-limited tail sampling applied and deal with incomplete traces than to see non-probabilistic rate-limited tail sampling applied w/o an incomplete traces problem. Counting spans is more important than having complete traces to us.

anuraaga commented 3 years ago

I've gotten a bit confused as we talk about many different algorithms and also bringing in tail sampling. Let me try to take a step back - I think it's a common, and reasonable, request to have a sampler that limits traces per second because

My understanding of this discussion is that OpenTelemetry cannot provide a direct solution for this if we make the ability to record adjusted counts a hard requirement, and the best we could offer is suggesting using probabilistic sampling, which would naturally need to be based on the peak traffic of a service to avoid going over the limit. If so, I don't know how to solve this user need though.

jmacd commented 3 years ago

@anuraaga thanks for reminding us of the bigger picture. I am not saying we have to make the ability to record adjusted counts a hard requirement, and yet I'm trying to avoid standardizing Samplers that do not have this ability.

For me, the bigger picture is that we do not have a way to rate-limit non-root spans without breaking traces. You are talking about rate-limiting trace roots, which only gets us so far in limiting the impact of tracing on dollars spent. A child span cannot use the rate-limiter to help with this situation, so fan-out may lead to excessive tracing and we are left adjusting sampling probabilities to account for peak fan-out in the leaves.

I've mentioned an algorithm (named "R") that performs uniform random reservoir tail sampling in a few lines of code. There is one random number generated per span to consider whether it replaces an item in the current sample. This Sampler can perform rate-limited tail sampling with correct adjusted counts, technical details aside, provided that the inputs have equal probability (which they do from a TraceIDRatio Sampler). This ensures that the number of Spans written by an individual RateLimit(TraceIDRatio) Sampler has a hard limit.

The problem that remains is to dynamically adjust the TraceIDRatio Sampler rate that is used, which was suggested by @oertl above. I am explicitly suggesting a combination of two Samplers to solve the problem here: a TraceIDRatio Sampler that ensures the correct average number of traces is produced with equal probabilities over a short time period, and rate-limited resampler (i.e., "R") that makes an unbiased selection and adjusts counts correctly when the limit is exceeded.

This ensures that during peak load, when the average span rate is exceeded, there will be spans that are dropped because of the rate limit while preserving correct adjusted counts. This will only have incomplete traces when there are sudden bursts in the span rate causing the first Sampler to produce too many spans.

jmacd commented 3 years ago

(See also https://github.com/open-telemetry/oteps/pull/148#issuecomment-879583919)

Lightstep has open-sourced its implementation of the VarOpt sampling algorithm, one of the algorithms listed in OTEP 148.

https://github.com/lightstep/varopt

VarOpt is a useful tool for building second-stage Tail samplers because they (1) accept unequal adjusted-count inputs and (2) produce fixed-size output. We wish to promote this as an alternative to the Leaky bucket rate-limited sampler discussed in this issue. I believe VarOpt and the Leaky-bucket samplers will have similar code complexity, but VarOpt is able to output unbiased adjusted counts, making it possible to probabilistically count spans that make it through the Sampler.

tigrannajaryan commented 3 years ago

@jmacd @anuraaga since I have no expertise in sampling can I reassign this issues to one of you?

jmacd commented 3 years ago

The specification draft in https://github.com/open-telemetry/oteps/pull/170 would permit arbitrary samplers that may or may not know the adjusted count. I am hopeful that the set of samplers built in to the SDK and specification all support adjusted count, as discussed in https://github.com/open-telemetry/oteps/pull/168.

Whether OTel specifies a Sampler that performs rate limiting in a non-probabilistic way is the question here. We've offered several ways that a Sampler can be built with a strict rate limit and also known adjusted counts.

The first is described above and involves adjusting a fixed probability periodically to maintain an average rate below the threshold. The same technique can be extended to enforce a hard rate limit using a second stage uniform probability reservoir sampler.

The second recommended approach builds on varopt which Lightstep has open-sourced to help with this discussion. Using a weighted sampling algorithm such as varopt allows implementing a hard rate limit without giving up control over what goes into the sample.

We should compare and contrast the complexity of these proposals with the complexity of implementing a leaky bucket. Leaky bucket algorithms are not simple pieces of code, and for all that complexity we lose the ability to approximately count sampled spans. The probability sampling approaches outlined above are not more complicated than the algorithms behind a leaky bucket sampler. I will follow up with a demonstration of rate-limited probability sampling.

anuraaga commented 3 years ago

Thanks @jmacd a demo of rate-limited probability sampling would be great!

jmacd commented 3 years ago

I will discuss this "approximately" rate-limited Sampler in tomorrow's Sampling SIG. https://github.com/open-telemetry/opentelemetry-go/pull/2180

This builds on the prototype for OTEPs 168 and 170, https://github.com/open-telemetry/opentelemetry-go/pull/2177.

The rate limiter logic is here: https://github.com/open-telemetry/opentelemetry-go/pull/2180/files#diff-8fc7ac8eaaeb9d58ee7eb261b0ffd7b61dbbcae2ce49103b1c02648acf61ce39

willarmiros commented 3 years ago

Related, we've just open-sourced our design of the X-Ray remote sampler in OpenTelemetry which uses a leaky-bucket sampler: https://docs.google.com/document/d/11V1CDr6eLoMq_3cK1bUm0GzYEyhEiI_83QEJMUFfu9g/edit?usp=sharing

yurishkuro commented 3 years ago

@willarmiros nice! Looks very similar to Jaeger's remote sampler. It would be great to come up with OTEL-standard language for describing sampling policies, so that we could converge on a single solution.

jmacd commented 3 years ago

@willarmiros Please review OTEPs https://github.com/open-telemetry/oteps/pull/168 and https://github.com/open-telemetry/oteps/pull/170 for a discussion about probabilistic sampling. I am enthusiastic about remotely configured sampling policies, but continue to recommend against non-probability sampling schemes such as Leaky Bucket, as they prevent span-to-metrics pipelines from being built. Simply said, with Leaky Bucket sampling you cannot accurately estimate the number of spans that are taking place, meaning you must fall back to a metrics solution for monitoring request rates and precludes many kinds of trace analysis.

The comment above links to a probabilistic rate-limited sampler for making head sampling decisions. A number of reservoir sampling techniques are available for making rate-limited tail sampling decisions, so it is my opinion that we should not standardize a Leaky Bucket sampler in OpenTelemetry.

willarmiros commented 3 years ago

@yurishkuro @jmacd thank you for the initial feedback! I didn't fully understand the concerns with adding a leaky-bucket sampler to the spec until this explanation, so thank you for that. I would be happy to discuss this further at the sampling SIG, I can't attend on 8/26 but will try for 9/2.

One of the main motivations behind sharing was to define more of a spec/protocol for the remote aspect of the sampler. Perhaps so that different backends could be plug-and-play for remote sampling requests much like they are for receiving telemetry data today.

spencerwilson commented 2 years ago

This sampler makes extrapolation difficult as the sampling probability cannot be determined.

non-probability sampling schemes such as Leaky Bucket

@oertl or @jmacd Could either of you elaborate on this? Specifically, I don't see why token bucket doesn't satisfy the definition of probability sampler: "A probability Sampler is a Sampler that knows immediately, for each of its decisions, the probability that the span had of being selected" (source).

If it helps us align, I've written an alternate definition of probability sampler that reflects my understanding:

A probability sampler is a sampler that performs Poisson sampling. When considering whether to sample a span, first determine the span's inclusion probability, then conduct a Bernoulli trial to determine whether to include the span in the sample. The algorithm to determine inclusion probabilities may be arbitrary, and even nondeterministic.

Under this definition, viable algorithms to determine inclusion rate include:

If I've misunderstood the definition of probability sampler, perhaps revising the definition in the SDK spec is warranted.

[Token bucket and VarOpt] are both needed only when combining spans produced by other samplers.

@jmacd Could you elaborate? I don't follow.

spencerwilson commented 2 years ago

Ah, for my first question, perhaps this other sentence from the spec is the issue: "Sampling probability is defined as a number less than or equal to 1 and greater than 0 (i.e., 0 < probability <= 1). The case of 0 probability is treated as a special, non-probabilistic case."

This is interesting. I would amend my definition, then, to read

A probability sampler is a sampler that performs Poisson sampling, with the additional constraint that every span has a nonzero inclusion probability. When considering whether to sample a span, first determine the span's inclusion probability, then conduct a Bernoulli trial to determine whether to include the span in the sample. The algorithm to determine inclusion probabilities may be arbitrary, and even nondeterministic.

Is that correct? Additionally, is there some place I can read to better understand what's meant by "0 probability is treated as a special, non-probabilistic case"? My best guess as to why this is: Fundamentally, if a system can certain situations totally decline to report telemetry, then that compromises the sample it creates in such a way that statistics from that sample may not be valid estimators. Which, as you've said, reduces the utility of span data considerably.

In other words, pausing telemetry reporting (hard rate limiting, AlwaysOff sampling, etc.) for any nonzero amount of time in a given time interval renders the sample statistics during that interval not valid estimators. Is that the crux of it?

I've mentioned an algorithm (named "R") that performs uniform random reservoir tail sampling in a few lines of code. There is one random number generated per span to consider whether it replaces an item in the current sample. This Sampler can perform rate-limited tail sampling with correct adjusted counts, technical details aside, provided that the inputs have equal probability (which they do from a TraceIDRatio Sampler). This ensures that the number of Spans written by an individual RateLimit(TraceIDRatio) Sampler has a hard limit.

@jmacd are you referring to a true hard limit here, or just "the expected value of span rate is equal to the limit"? If you actually meant hard limit, I don't follow.

A non-probability sampler is a Sampler that makes its decisions not based on chance, but instead uses arbitrary logic and internal state. The adjusted count of spans sampled by a non-probability sampler is unknown.

"Arbitrary logic" is pretty fuzzy, and seemingly doesn't capture the desired qualities, as most of the "inclusion probability determination" algorithms in my previous message still seem to me to be probability samplers, despite having quite sophisticated logic (looking at you, PID controller 😼). Perhaps the only definition needed for non-probability sampler is "A sampler that is not a probability sampler"?

PeterF778 commented 2 years ago

I like your description of a probability sampler a lot, this is how it is supposed to work. So far we did not focus on determining the inclusion probability, as it is not critical for arguing about the properties of probability sampling. But I'm not sure if accepting this description as definition would be a right step. Perhaps it goes too deeply into the mechanics of implementation, and also for many readers it might prove too intimidating.

jmacd commented 2 years ago

The term "inclusion probability" was used heavily in the OTEP, https://github.com/open-telemetry/oteps/blob/main/text/trace/0170-sampling-probability.md. I found that this concept does not resonate with many end users, especially because they're likely to be familiar with fixed-probability sampling, where sampling probability is the same as inclusion probability. When discussing unequal-probability sampling or reservoir sampling techniques, inclusion probability is certainly a good term to be using, but for the end user, it's still the inverse that matters, so in the final specification document I emphasized "adjusted count" (i.e., the inverse of inclusion probability).

jmacd commented 2 years ago

@spencerwilson For your specific question about zero probability, you're correct that "statistics from that sample may not be valid estimators". If inclusion probability is zero, then no spans are being collected, so it's not clear what we're estimating from in that case. We use the value of zero for adjusted count to mix probability and non-probability samplers, because zero is the amount to reflect in the span count, without introducing bias, for an item that was not selected.

With respect to your question about "hard rate limits", we know about samplers that can achieve a fixed rate limit, but in those cases the inclusion probability is not known at the time the span starts, it's known when the sample is complete. (e.g., algorithms "R" and "S" are names Knuth gave these in TAOCP Vol 2, 3.4.2).

The problem with these is that you risk having incomplete traces in order to have a hard rate limit. The leaky-bucket rate limiter does have a hard rate limit, but the inclusion probability or adjusted counts from this technique aren't directly known. Instead, we can design samplers that dynamically adjust their sampling probability to achieve an "reasonable confidence" rate limit. Is this something you're interested in?

spencerwilson commented 2 years ago

Thanks for the confirmation! And oops, "traces must be complete" was indeed a requirement I had in my head (from your comment, in fact) but I forgot to restate; thanks for the correction.

we can design samplers that dynamically adjust their sampling probability to achieve an "reasonable confidence" rate limit. Is this something you're interested in?

Yeah, as a user that's totally adequate. The one tracing storage system I have significant experience with has the following constraints:

If my system created spans at a constant rate of 20e6/30.4/24/60/60 = 7.6 span/s, then I'd be just hitting my storage system's limits. But, say I happen to know that my system's span creation rate is quite variable: it creates 23 span/s 30% of the time, and 1 span/s 70% of the time. In that case, I'd probably set my targetSpansPerSecondLimit to 23 span/s. While this does theoretically allow me to exceed my limits, I'm banking on my workload averaging creating 7.6 span/s over the month.

Abstracting a bit, maybe my tuning rule of thumb would be something like:

  1. Knowing nothing about my system's span creation rate, initialize targetSpansPerSecondLimit to my storage system's "maximum constant rate" (7.6 in the preceding example).
  2. Monitor daily span collection "actuals", and adjust up or down as needed based on how you're tracking toward the monthly limit.

re: what sort of rate-limiting logic to require all OTel SDKs to provide, I'm supportive of placing a high value on the "sample statistics are valid estimators" criterion. Beyond that: I'm going to share some notes later in #otel-sampling, but, spoiler alert: of all the designs I've seen mentioned (plus one or two of my own), I've nearly convinced myself that @oertl's design in https://github.com/open-telemetry/opentelemetry-java-contrib/pull/226, which

strikes the best balance of simplicity and effectiveness.