open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.08k stars 2.37k forks source link

[processor/filter] Standardize configuration on OTTL #18642

Open TylerHelmuth opened 1 year ago

TylerHelmuth commented 1 year ago

Is your feature request related to a problem? Please describe.

With https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/16369, OTTL was added to the filterprocessor. This allows us to standardize the way the filterprocessor configures conditions for when data should be dropped.

For spans, the existing non-ottl configuration options are:

For metric, the existing non-ottl configuration options are:

For logs, the existing non-ottl configuration options are:

With the completion of https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/16413, OTTL can now handle each of these use cases, and more, in a uniform library.

In the past we only had what was in internal/filter and as different components needed different condition features those packages grew. As users wanted access to different fields in the telemetry payload the signal-specific filters would change to reflect those needs. The filterprocessor grew with it, providing important value but with increased maintenance burden and configuration complexity. Reading through the filterprocessor's readme highlights the difficulty in maintaining and understanding the different features and fields available.

By unifying on OTTL we'll have a solution that has access to all fields on all signals. Users no longer need to worry about whether or not a field for their signal is available to use and maintainers no longer need to worry about adding more fields to filter on in the future (OTLP changes excluded). Due to OTTL's functions, adding more features to enable complex conditions is simpler as the functions encapsulate the logic and can be added without modifying the underlying libraries or configuration. On top of its field access and functions, OTTL's grammar also provides more robust conditions, allowing users to use inequalities, nil, and arithmetic.

Steps to completion

TylerHelmuth commented 1 year ago

Based on the SIG meeting today, we have decided to take a hands-on approach with deprecating the non-OTTL configuration to help users with the transition.

Since the non-OTTL filter structs/features are translatable to OTTL, we will start out the deprecation process by adding a translator to the filterprocessor/internal/filter that can convert the non-OTTL structs into the equivalent OTTL statement. The filterprocessor will be updated to use this translator so that it is always using OTTL behind the scenes and users are unaffected.

We can then deprecate the non-OTTL config and start logging warnings and the equivalent OTTL statement so that users know exactly what to change in their config. Eventually we can use a feature gate to fail on startup if using the non-OTTL config, but let users enable the feature gate to keep using them. Finally, we completely remove the old config and the feature gate.

The timeline from logging deprecation warnings and equivalent OTTL statements -> featuregate and config removal will be large, probably about 3 months. This timeline will not start until the converter is implemented.

TylerHelmuth commented 1 year ago

The filterprocessor (and only the filterprocessor) uses internal/filter/filtermetric to allow limited filtering using the expr package. I propose that converting between this language and OTTL should be out-of-scope. Instead, if the filter.filtermetric.useOTTLBridge feature gate is enabled and the configuration contains expr matching, the collector will fail to start, including an error about using OTTL instead.

Hronom commented 1 year ago

@TylerHelmuth please can you extend set of examples in README.md.

It's also seems like with OTTL not possible to replace this:

      filter/allowed_metrics:
        metrics:
          include:
            match_type: expr
            expressions:
              # kube-state-metrics
              - MetricName == "kube_pod_container_resource_requests" && Label("namespace") == "test"
              - MetricName == "kube_pod_container_resource_limits" && Label("namespace") == "test"
              # kubernetes-nodes-cadvisor
              - MetricName == "container_cpu_usage_seconds" && Label("namespace") == "test"
              - MetricName == "container_cpu_cfs_throttled_periods" && Label("namespace") == "test"
              - MetricName == "container_cpu_cfs_periods" && Label("namespace") == "test"
              - MetricName == "container_memory_working_set_bytes" && Label("namespace") == "test"
              # Istio, source https://istio.io/latest/docs/reference/config/metrics/#metrics
              - MetricName == "istio_requests" && Label("namespace") == "test"
              - MetricName == "istio_request_duration_milliseconds" && Label("namespace") == "test"
              - MetricName == "istio_request_bytes" && Label("namespace") == "test"
              - MetricName == "istio_response_bytes" && Label("namespace") == "test"
              # prometheus-node-exporter
              - MetricName == "node_cpu_seconds"
              - MetricName == "node_memory_MemTotal_bytes"
              - MetricName == "node_memory_MemFree_bytes"
              - MetricName == "node_memory_Buffers_bytes"
              - MetricName == "node_memory_Cached_bytes"

so basically here I'm allow only specific set of metrics based on conditions. It's seems like this case not possible with OTTL as it works like exclude filter. But what is needed it's an include functionality, please can you provide an example how to convert example above to the OTTL?

TylerHelmuth commented 1 year ago

@Hronom the fastest way to convert your config into OTTL syntax would be to take advantage of the not operator. I believe the following config is exactly equivalent to your config:

  filter:
    error_mode: ignore
    metrics:
      metric:
        - |
          not (
            name == "kube_pod_container_resource_requests" and HasAttrOnDatapoint("namespace", "test") or
            name == "kube_pod_container_resource_limits" and HasAttrOnDatapoint("namespace", "test") or

            name == "container_cpu_usage_seconds" and HasAttrOnDatapoint("namespace", "test") or
            name == "container_cpu_cfs_throttled_periods" and HasAttrOnDatapoint("namespace", "test") or
            name == "container_cpu_cfs_periods" and HasAttrOnDatapoint("namespace", "test") or
            name == "container_memory_working_set_bytes" and HasAttrOnDatapoint("namespace", "test") or

            name == "istio_requests" and HasAttrOnDatapoint("namespace", "test") or
            name == "istio_request_duration_milliseconds" and HasAttrOnDatapoint("namespace", "test") or
            name == "istio_request_bytes" and HasAttrOnDatapoint("namespace", "test") or
            name == "istio_response_bytes" and HasAttrOnDatapoint("namespace", "test") or

            name == "node_cpu_seconds" or
            name == "node_memory_MemTotal_bytes" or
            name == "node_memory_MemFree_bytes" or
            name == "node_memory_Buffers_bytes" or
            name == "node_memory_Cached_bytes"
          )

I am pretty sure you could also do some boolean arithmetic if you don't want to use not, but I think not is the clearest approach.

Hronom commented 1 year ago

@TylerHelmuth thanks a lot! Was didn't aware of not Yes this is the most simple way and I think the most easy to understand.

I think it makes sense to add such example in README.md. The use case behind this conditions is next: when you use managed solutions for storing metrics(e.g. Datadog, AWS, Grafana Labs etc) in order to reduce the price - you need to filter out metrics that being send there.

I think this is pretty common use case.

And seems like filter process is the most reasonable processor for doing such things.

TylerHelmuth commented 1 year ago

filterspan, filtermetric, and filterlog are now all able to convert their configuration to OTTL syntax and execute equivalent boolean expressions. The next step is logging these statements so that users have an easier time updating their config.

It turns out formatting a friendly, config-ready log to help users recreate their config in OTTL syntax is difficult. This is the best log I have so far:

{"level":"info","msg":"You are using an outdated configuration format.  Please use the traces.span configuration option with the attached statements.","statements":"not ((resource.attributes[\"service.name\"] == \"keep\" or resource.attributes[\"service.name\"] == \"also keep\") and (kind.deprecated_string == \"keep\" or kind.deprecated_string == \"also keep\")), (resource.attributes[\"service.name\"] == \"keep\" or resource.attributes[\"service.name\"] == \"also keep\") and (kind.deprecated_string == \"keep\" or kind.deprecated_string == \"also keep\")"}

Unless you look closely it isn't clear that there are 2 statements (separated by ,) in the log. @mx-psi do you have idea good ideas on how to help move this configuration transition forward?

mx-psi commented 1 year ago

Something like open-telemetry/opentelemetry-collector/issues/7631 would definitely help here, but I haven't had time to look into it further and I am not planning to work on it at least for the next month.

I guess linking to a docs page with further info (could be an issue) would be great. That way we can keep an example there, and have a common place of discussion were people to find a problem with the configuration migration.

Lastly, having multiline logs is supported by some formats, so maybe statements should have each statement separated by a newline?

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.