open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.02k stars 2.33k forks source link

To achieve different sampling rates for different applications and integrate them with the OTel collectors #31562

Open zendesk-shweta opened 7 months ago

zendesk-shweta commented 7 months ago

Component(s)

No response

Describe the issue you're reporting

How can we set up the different sampling rates for different applications and integrate them with the OTel collectors to have a centralized control over sampling rate on otel config side.? What are the different approaches to achieve this on otel side?

crobert-1 commented 7 months ago

Usually sampling rates for applications are determined by settings in the configured receivers. To choose different sampling rates for different applications, you'd want to check the configuration options for each receiver you're interested in using, and go from there.

Is that generally what you're wondering, or did I misunderstand?

zendesk-shweta commented 7 months ago

Lets say i have 2 services running on the same cluster as otel collector and each service is sending the traces to otel collector, now our requirement is to set the different sampling rates for each service on otel.config file , can i define the sampling rates like this ? extensions: pprof: endpoint: :1888 zpages: endpoint: :55679

receivers: otlp: protocols: grpc: endpoint: 0.0.0.0:4317 http: endpoint: 0.0.0.0:4318

processors: batch: probabilistic_sampler/tracing: sampling_percentage: 5 rules:

exporters: logging: loglevel: debug debug: verbosity: detailed datadog: api: site: "datadoghq.com" key: ${env:DD_API_KEY} tls: insecure_skip_verify: true sending_queue: enabled: true queue_size: 200 num_consumers: 100 timeout: 1s retry_on_failure: enabled: false initial_interval: 5s max_interval: 30s max_elapsed_time: 5m

service: pipelines: traces: receivers: [otlp] processors: [memory_limiter, batch, probabilistic_sampler/tracing] exporters: [debug, datadog, debug] # Change to datadog metrics: receivers: [otlp] processors: [memory_limiter, batch] exporters: [logging, datadog, debug] # Change to datadog

extensions: [pprof, zpages]

Or is there a better way to achieve this? As we may have large number of service sending the traces to otel collector and i am wondering how will we add the all the services under processors?

crobert-1 commented 7 months ago

My apologies @zendesk-shweta, for some reason I misinterpreted your question thinking you were asking about how often to scrape endpoints in a receiver, not the sampling rate in the probabilistic sampler 👍

I don't think it's possible in a single processor definition for this processor. You'd likely have to define entirely different receivers and processors, and then have a pipeline in the collector for each service. I suggest this solution as I don't think this processor filters based on attributes, so all data that it gets would be sampled at the same rate. To be able to sample two sets of data at a different rate, you'd need the data sets to be separately received and processed, to my understanding. The code owners would have a definitive answer though, I'm not very familiar with this component and I may be missing something here.

I'll mark this as an enhancement request.

github-actions[bot] commented 7 months ago

Pinging code owners for processor/probabilisticsampler: @jpkrohling. See Adding Labels via Comments if you do not have permissions to add labels yourself.

zendesk-shweta commented 7 months ago

/label processor/sampler help-wanted

jmacd commented 7 months ago

This is a reasonable request. See https://github.com/open-telemetry/oteps/pull/250. I expect this functionality will eventually emerge in the probabilisticsamplerprocessor, and that the OpAmp protocol will be used to distributed sampling configurations via an OTel sampling configuration, but there is a lot of work to do.

github-actions[bot] commented 5 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.