Document clarity re: sampling_priority configuration

pierzapin commented 8 months ago

Component(s)

processor/probabilisticsampler

Describe the issue you're reporting

It's not very clear how the sampling_priority configuration in the probabilistic_sampler is intended to work. The readme bullet point for this processor just above the Hashing heading states:

sampling_priority (default = null, optional): The optional name of a log record attribute used to set a different sampling priority from the sampling_percentage setting. 0 means to never sample the log record, and >= 100 means to always sample the log record.

which is somewhat ambiguous - is this meant to be a string (i.e. an attribute name as stated) or a int between 0 and 100?

My initial expectation based on the readme and the example config here was that this setting works in tandem with the from_attribute to provide some sort of override mechanism to the blanket sampling rate. i.e:

processors:
  probabilistic_sampler/logs:
    sampling_percentage: 15
    attribute_source: record 
    from_attribute: foo
    sampling_priority: 100

Which would suggest that if the data included the attribute "foo" that it'd be sampled at 100%

However testdata/config.yaml#L43 implies that another attribute name is used to drive the sampling_priority. I have no idea how this would work if the attribute value is itself a string?

I'd be happy to contribute wording updates based on your advise about which way this is intended to function.

github-actions[bot] commented 8 months ago

Pinging code owners:

processor/probabilisticsampler: @jpkrohling

See Adding Labels via Comments if you do not have permissions to add labels yourself.

crobert-1 commented 8 months ago

The sampling_priority configuration option will hold the string name of a log attribute. The processor will get the log attribute's value and then use it as the sampling rate, overriding the value of sampling_percentage. Here's the code for reference.

from_attribute will determine which log attribute's value will be used as the hashing value, it's unrelated to the sampling rate. The processor will get the attribute value and then compute a hash on it, then use the sampling priority (either set by sampling_percentage or the log attribute sampling_priority's value) to determine if the log itself is sampled or not.

I agree this could be made more clear in the README, a PR would be welcomed!

jpkrohling commented 8 months ago

Should this be closed?

crobert-1 commented 8 months ago

Should this be closed?

I don't think so, my PR was mostly unrelated to this. We still need to update the README to close this, from my understanding.

github-actions[bot] commented 6 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

processor/probabilisticsampler: @jpkrohling

See Adding Labels via Comments if you do not have permissions to add labels yourself.

jmacd commented 4 months ago

https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/31946#discussion_r1597799522

github-actions[bot] commented 2 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

processor/probabilisticsampler: @jpkrohling @jmacd

See Adding Labels via Comments if you do not have permissions to add labels yourself.

jpkrohling commented 2 months ago

@kentquirk , would you be able to transform your comment in documentation at the readme for the component?

github-actions[bot] commented 1 week ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

processor/probabilisticsampler: @jpkrohling @jmacd

See Adding Labels via Comments if you do not have permissions to add labels yourself.

open-telemetry / opentelemetry-collector-contrib

Document clarity re: sampling_priority configuration #30410

Component(s)

Describe the issue you're reporting