jaegertracing / jaeger

CNCF Jaeger, a Distributed Tracing Platform
https://www.jaegertracing.io/
Apache License 2.0
20.41k stars 2.44k forks source link

Dynamic configuration store #355

Open yurishkuro opened 7 years ago

yurishkuro commented 7 years ago

We need a dynamic configuration solution that comes in handy in various scenarios:

Specifically, we need to define the following:

trtg commented 7 years ago

I just want to confirm that I was correct in thinking that currently, when using the probabilistic sampler, there is no way to change the sampling rate on the fly. Is that true?

yurishkuro commented 7 years ago

That is correct, the existing implementation that is open sourced always returns the same sampling strategy to all agents. We're in the process of open sourcing the adaptive sampling solution, but for config-driven approach we internally rely on an internal dynamic config store that is not open source.

trtg commented 7 years ago

I see. When we were previously using zipkin, it was fairly handy that the span decorators and context managers took a sample rate parameter which we read out of multilevel cache (in memory, periodically refreshed from redis), so we could just dynamically change the value of a redis key to adjust the sample rate. I remember being confused with jaeger that the sample rate parameter was only passed on initialization/construction of the tracer, with no methods to set it later on.

yurishkuro commented 7 years ago

That's an interesting observation. Fundamentally I think the service (and service owner) is ill-suited to set the correct sampling probability in production, that is why our samplers are implemented to be controlled from the central tier, either via manual config or via automatic control loop. However, there are use cases when it is useful to control sampling on a per-service instance, e.g. if you're debugging something on a specific host. Our current mechanism is not directly useful for that since the pull for sampling rates done by the client only specifies the service name, not the identity of the instance - if it did, you can achieve the same result as in your zipkin experience while still keeping sampling control in the central tier. This may be a good use case to keep in mind.

stammana commented 7 years ago

Just wanted to check if the agent is mandatory to adopt the feature Adaptive sampling. We are trying to replace the jaeger agent with fluentd pipeline for tracing, I am not sure if we would be able to use the adaptive sampling with this approach. What is the control point for adaptive sampling at the collector, how are the changes get triggered? Also wanted to check if jaeger client generates the trace for all the requests and the collector discards the non sampled ones. Can you please confirm?

black-adder commented 7 years ago

The agent is only a proxy for the collector when it comes to adaptive sampling. The collector actively calculates new sampling rates as the spans come in and the client fetches the sampling rates via agent which fetches them from the collectors.

If you want to remove the agent completely, then we'd need to build a mechanism for the clients to fetch sampling rates directly from the collector which should be easily doable (but we'd have to bloat the client with all the service discovery mechanisms, etc.). As you can probably tell, there's some design work that's needed before we can do this.

Only sampled traces are sent to the collectors. The client discards the unsampled ones.

annanay25 commented 5 years ago

A library like - https://github.com/micro/go-config - could provide some helpful features like watching/notify for config changes and listening to multiple sources (etcd + env-vars, for example).

jpkrohling commented 5 years ago

I have nothing against go-config, but those features are also provided by the lib we are currently using (Viper).