Open linjmeyer opened 4 years ago
Unfortunately, the schema dictates an int:
I see, I didn't know that Thrift repo existed. Could we convert this to a feature request or should I open a new one?
Sorry if this is obvious, but why do you need to sample less than 1 trace per second? Do you have a huge number of endpoints?
We used <<1.0 rate quite a lot, because some API services have 1000s of instances, especially with event loop based languages where it's common to run one process per core. However, we used it for defaultLowerBoundTracesPerSecond, which is already a double.
We use Jaeger in almost all of our deployments, something like 500-800 containers running Jaeger. The higher being on the weekends typically when we see more traffic. I think (up to) 800 traces per second is too high. Even ignoring the scaling/storage costs, we just don't need that many as we don't have the head count to look at all that data.
why don't you use probabilistic sampling?
Sorry for the confusion, we are using probabilistic. These are REST APIs (ASP.NET), so 0.6
ends up with ~60% of incoming HTTP requests being sampled for example.
The per operation sampler you linked sounds perfect! Being able to set upper/lower bounds would be great, 60% during peak times is a lot, while theres room to spare during lower traffic periods.
Upper-bound is not supported in the per-operation sampler because it breaks the ability to extrapolations full traffic measurements from the samples. But since Jaeger doesn't make use of those at the moment anyway, and if you don't care, then implementing the rate limiting upper bound sounds like a useful feature. For people who care about extrapolations, the upper bound can be an optional feature.
Got it, that's pretty much the only use-case I could think of.
In any case, is networking an issue for you? You can always down-sample (--downsampling.ratio
) at the collector if your concern is only storage.
I'd rather avoid dropping spans on the collector if possible. We log the Trace ID on each service if it is sampled. Our devs know if they see an error or log message in Loki with a TraceID, they can go to Jaeger and look at the exact request. If we are downsampling we lose the ability to accurately do that as the client doesn't know the collector dropped the trace.
Describe the bug Rate limiting sampler strategy param is an integer. Setting a decimal param value does not work, forcing a minimum rate of 1 trace per second.
To Reproduce Steps to reproduce the behavior:
{"strategyType":"RATE_LIMITING","rateLimitingSampling":{"maxTracesPerSecond":0}}
Expected behavior I would expect to be able to specify a rate using a decimal, for example 0.5 traces per second
Screenshots If applicable, add screenshots to help explain your problem.
Version (please complete the following information):
Additional context The problematic code seems to be here, where Param is cast to an int16.
I have filed as a bug report because I have checked several clients and they seem to support decimal values:
double
double
double
float64