Distinguish Tail Sampling Policies in Metrics by Prefixing Processor Name

EOjeah commented 1 month ago

Component(s)

processor/tailsampling

Is your feature request related to a problem? Please describe.

Currently, the otelcol_processor_tail_sampling_sampling_decision_latency_bucket metric does not distinguish between different tail sampling policies if they share the same name. This is problematic because it makes it difficult to differentiate between metrics emitted by different policies, especially when they are defined in separate tail sampling processors. For example, both tail_sampling/match and tail_sampling/string processors have a policy named default-policy, leading to identical metric labels and causing confusion in metric analysis.

  tail_sampling/match:
    policies: [
      {
        name: default-policy,
        type: ottl_condition,
        ottl_condition: {
          span: [
            'IsMatch(attributes["app.force_sample"], "true")',
          ]
        }
      }
    ]

  tail_sampling/string:
    policies: [
      {
        name: default-policy,
        type: ottl_condition,
        ottl_condition: {
          span: [
            'IsString(attributes["app.force_sample"])',
          ]
        }
      }
    ]

Describe the solution you'd like

To resolve this issue, I propose prefixing the policy label in the metric with the name of the tail sampling processor. This way, the metrics will be uniquely identifiable based on both the processor and the policy name. For instance:

The metric for the default-policy in the tail_sampling/match processor would be labeled as tail_sampling/match-default-policy. (or remove the tail_sampling if it's too obvious and just use the name after the / if any)

The metric for the default-policy in the tail_sampling/string processor would be labeled as tail_sampling/string-default-policy.

Describe alternatives you've considered

Unique Names for Each Policy: One alternative is to ensure that each policy has a unique name across all processors. However, this approach is not scalable, especially when policies are defined across multiple files. It would require more coordination and management to avoid name collisions.

Additional context

Only raised the issue for otelcol_processor_tail_sampling_sampling_decision_latency_bucket but it's likely to be the case for the other metrics from the tail sampling processor unique to each policy definition.

github-actions[bot] commented 1 month ago

Pinging code owners:

processor/tailsampling: @jpkrohling

See Adding Labels via Comments if you do not have permissions to add labels yourself.

jpkrohling commented 1 month ago

Seems like an easy one to tackle. Do you want to give it a try? You can get the "componentID" from processor.Settings that the Factory receives. You'd need to propagate it down to the place we initialize the telemetry.

@codeboten, perhaps the component ID should be a parameter of the telemetry builder? I believe the same problem would occur to every other component.

EOjeah commented 1 month ago

sure, I'll give it a go, can assign the issue to me

open-telemetry / opentelemetry-collector-contrib