open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.88k stars 2.25k forks source link

[processor/tailsampling] invert_match precedence #34085

Open hyang023 opened 1 month ago

hyang023 commented 1 month ago

Component(s)

processor/tailsampling

What happened?

Description

a recent fix went in to address https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/33656 and align with expected behavior documented at https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/tailsamplingprocessor/README.md When there's an "inverted not sample" decision, the trace is not sampled

however, in the same README, the tail_sampling.policies[] example policy list includes this backwards compatibility policy

      {
        # Rule 1: use always_sample policy for services that don't belong to team_a and are not ready to use tail sampling
        name: backwards-compatibility-policy,
        type: and,
        and:
          {
            and_sub_policy:
              [
                {
                  name: services-using-tail_sampling-policy,
                  type: string_attribute,
                  string_attribute:
                    {
                      key: service.name,
                      values:
                        [
                          list,
                          of,
                          services,
                          using,
                          tail_sampling,
                        ],
                      invert_match: true,
                    },
                },
                { name: sample-all-policy, type: always_sample },
              ],
          },
      },

and using a collector with this fix means that service.name values [list, of, services, using, tail_sampling] no longer can have other sampling policies applied to them since their InvertNotSampled condition will override any applicable Sampled or InvertSampled

Is this change in behavior intended? And if so, what is the new recommended way to define a backwards compatibility policy?

Steps to Reproduce

  1. use the tail sampling processor
  2. define a set of tail sampling policies including a backwards compatibility policy

Expected Result

Expect to be able to define sampling policies for service.name values [list, of, services, using, tail_sampling] and always_sample for any other service.name not in this list (or as specified in other policies)

Actual Result

no traces for service.name values [list, of, services, using, tail_sampling] can be exported since all are excluded as a result of InvertNotSampled

Collector version

0.104.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04") Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

No response

Log output

No response

Additional context

No response

github-actions[bot] commented 1 month ago

Pinging code owners:

savar commented 1 month ago

Another (possible) issue with invert_match precedence is (or I am just not able to do that), to do something like:

  1. probabilistic-ally catch http.route == /health for 0.1%
  2. probabilistic-ally catch http.route != /health for 20%

The 1st one works easily, but the second one would need to not apply to /health and therefore an invert_match seems to be required, which, even in an and type will create a do not sample decision which will override the 1st rule.

I think the issue is the same as the backwards compatibility example.