Refinery is a trace-aware tail-based sampling proxy. It examines whole traces and intelligently applies sampling decisions (whether to keep or discard) to each trace.
Apache License 2.0
275
stars
86
forks
source link
Find a way to combine rules sampler with throughput sampler #525
Is your feature request related to a problem? Please describe.
A customer is using a rules-based sampler, but is running into bursty situations where it's overwhelming their desired input volume and then Honeycomb is rate-limiting them. Raising rate limits helps, but there's value in allowing the sample rates in the rules to move in response to actual throughput.
A feature like this could be part of allowing the throughput sampler to operate based on cluster throughput rather than individual instances.
Describe the solution you'd like
An idea is to allow the rules-based sampler to have a multiplier value (we'll call it throttle) that is normally 1, but could be increased to a larger value by a throughput sampler. If throughput exceeds the defined maximum, the throttle would be increased to the ratio between the active throughput and the desired throughput. The throttle is applied as a multiplier to the values in the rules -- provided those values are already greater than 1. (The value 1 implies that the rule means "keep every one of these" and thus any trace conforming to this rule should be kept. )
Example: suppose you're sampling, say, http.status 200s at a sample rate of 1000, and 400s at 10 and 500s at 1 -- and then this knob gets turned up by 33% -- you'd be sampling at 1333, 13, and 1.
The system should have some hysteresis to avoid fiddling with the throttle all the time.
Also, if Honeycomb returns a 429 (rate limit), the throttle should immediately be increased.
Describe alternatives you've considered
It's possible to increase the rate limit on a per-customer basis, but this solution would allow customers to more easily stay within their existing rate limits. Done right, this could be a recommended feature of rules-based sampling.
Is your feature request related to a problem? Please describe.
A customer is using a rules-based sampler, but is running into bursty situations where it's overwhelming their desired input volume and then Honeycomb is rate-limiting them. Raising rate limits helps, but there's value in allowing the sample rates in the rules to move in response to actual throughput.
A feature like this could be part of allowing the throughput sampler to operate based on cluster throughput rather than individual instances.
Describe the solution you'd like
An idea is to allow the rules-based sampler to have a multiplier value (we'll call it
throttle
) that is normally 1, but could be increased to a larger value by a throughput sampler. If throughput exceeds the defined maximum, the throttle would be increased to the ratio between the active throughput and the desired throughput. The throttle is applied as a multiplier to the values in the rules -- provided those values are already greater than 1. (The value 1 implies that the rule means "keep every one of these" and thus any trace conforming to this rule should be kept. )Example: suppose you're sampling, say, http.status 200s at a sample rate of 1000, and 400s at 10 and 500s at 1 -- and then this knob gets turned up by 33% -- you'd be sampling at 1333, 13, and 1.
The system should have some hysteresis to avoid fiddling with the throttle all the time.
Also, if Honeycomb returns a 429 (rate limit), the throttle should immediately be increased.
Describe alternatives you've considered
It's possible to increase the rate limit on a per-customer basis, but this solution would allow customers to more easily stay within their existing rate limits. Done right, this could be a recommended feature of rules-based sampling.
Additional context