open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.08k stars 2.38k forks source link

[pkg/ottl] OpenTelemetry Transformation Language and Transform Processor Roadmap #18643

Open TylerHelmuth opened 1 year ago

TylerHelmuth commented 1 year ago

Component(s)

pkg/ottl

Describe the issue you're reporting

Background

The OpenTelemetry Transformation Language (OTTL) is a language for transforming open telemetry data. Its primary use case is for the transformprocessor but can be used in any component.

Although OTTL's primary goal is to facilitate transforming telemetry, it's Condition logic is also useful in isolation. Since OTTL Conditions have access to functions and telemetry fields as well, it provides an all-encompassing solution for making decisions based on telemetry field values.

Since its inception, OTTL has started being used in more components in Contrib. As of writing this issue it is used in transformprocessor, routingprocessor, and filterprocessor. Currently both transformprocessor and routingprocessor take advantage of full OTTL statements, whereas filterprocessor only utilizes conditions.

When discussing the roadmap for OTTL in Contrib, there are 2 main focuses: what components can be replaced with the transformprocessor and how can OTTL conditions be used to standardize internal/filter and the components that use those packages.

OTTL and the Transform Processor The transform processor with OTTL provides an open opportunity for most stateless transformations of data. There is opportunity for, and already a lot of, overlap with other components. How should the Contrib repository handle these overlaps?

I propose utilizing the transform processor to reduce the number of components we need in Contrib, standardizing how data is transformed in the Collector. If users want to transform their data in the collector then I propose the transformprocessor as the "one-stop-shop" for those needs. That said, there are some guardrails I think we should follow:

  1. The goal of transformprocessor is to transform telemetry.
  2. The transformprocessor is stateless; it should not handle any stateful transformations.
  3. The transfromprocessor should not rely on any external source for its transformations. This means it should not need to call out to any APIs or databases.
  4. The transformprocessor is not a replacement for hyper-targeted processors (unless they want it to be). Processors like filterprocessor, vendor-specific processors, samplers, feel like bad candidates for the transformprocessor. (This guardrail is kinda "feely" and definitely needs discussed)

With those guardrails in place, I see these components as candidates for replacement by the transformprocessor. (list is in alphabetical order only)

Before any of these processor could be replaced work needs to be done to ensure the transformprocessor has complete feature parity. I believe a declarative syntax option will also be necessary.

OTTL as a Generic Condition Solution A major strength of OTTL is that is has conditions built into its grammar. These conditions have access to every field in the OTLP proto for each signal which means that users have no restrictions on what they choose to use in their conditions. By using Converter functions users are able to create complex conditions.

Thanks to https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/16413 OTTL has the capabilities of all other filter options in internal/filter. I propose we standardize conditions in Contrib on OTTL, updating all components that use internal/filter to use OTTL.

By unifying on OTTL we'll have a solution that has access to all fields on all signals. Users no longer need to worry about whether or not a field for their signal is available to use and maintainers no longer need to worry about adding more fields to filter on in the future (OTLP changes excluded). Due to OTTL's functions, adding more features to enable complex conditions is simpler as the functions encapsulate the logic and can be added without modifying the underlying libraries or configuration. On top of its field access and functions, OTTL's grammar also provides more robust conditions, allowing users to use inequalities, nil, and arithmetic.

For maintainers will allow us to reduce the amount of code we need to maintain.

Components that use internal/filter (that aren't listed in the above replacement candidate list): (list is in alphabetical order only)

Before non-OTTL packages in internal/filter could be replaced we should consider a Condition-specific parser and a reusable configuration for defining conditions.

Related beta issue for OTTL: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/28892 Related beta issue for the transform processor: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/28644

github-actions[bot] commented 1 year ago

Pinging code owners for processor/metricsgeneration: @Aneurysm9. See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 1 year ago

Pinging code owners for processor/logstransform: @djaglowski @dehaansa. See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 1 year ago

Pinging code owners for processor/attributes: @boostchicken. See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 1 year ago

Pinging code owners for processor/metricstransform: @dmitryax. See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 1 year ago

Pinging code owners for processor/resource: @dmitryax. See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 1 year ago

Pinging code owners for processor/span: @boostchicken. See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 1 year ago

Pinging code owners for processor/redaction: @leonsp-ai @dmitryax @mx-psi. See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 1 year ago

Pinging code owners for receiver/hostmetrics: @dmitryax. See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

TylerHelmuth commented 2 weeks ago

Over the last year we've see increased interest and activity with the redaction processor. We've also pointed to it directly when answering questions about how the collector handles keeping data safe. It is clearer now that having a targeted processor specifically for the use case of redacting data is useful from a governance and practicality standpoint. For this reason the redaction processor meets criteria 4 and should therefore not be replaced by the transformprocessor. I've removed it from the list.

Instead, the redactionprocessor could utilize OTTL if it ever needs more complex conditionals for when to redact.