Open TylerHelmuth opened 1 year ago
Based on the SIG meeting today, we have decided to take a hands-on approach with deprecating the non-OTTL configuration to help users with the transition.
Since the non-OTTL filter structs/features are translatable to OTTL, we will start out the deprecation process by adding a translator to the filterprocessor
/internal/filter
that can convert the non-OTTL structs into the equivalent OTTL statement. The filterprocessor
will be updated to use this translator so that it is always using OTTL behind the scenes and users are unaffected.
We can then deprecate the non-OTTL config and start logging warnings and the equivalent OTTL statement so that users know exactly what to change in their config. Eventually we can use a feature gate to fail on startup if using the non-OTTL config, but let users enable the feature gate to keep using them. Finally, we completely remove the old config and the feature gate.
The timeline from logging deprecation warnings and equivalent OTTL statements -> featuregate and config removal will be large, probably about 3 months. This timeline will not start until the converter is implemented.
The filterprocessor (and only the filterprocessor) uses internal/filter/filtermetric
to allow limited filtering using the expr package. I propose that converting between this language and OTTL should be out-of-scope. Instead, if the filter.filtermetric.useOTTLBridge
feature gate is enabled and the configuration contains expr
matching, the collector will fail to start, including an error about using OTTL instead.
@TylerHelmuth please can you extend set of examples in README.md.
It's also seems like with OTTL not possible to replace this:
filter/allowed_metrics:
metrics:
include:
match_type: expr
expressions:
# kube-state-metrics
- MetricName == "kube_pod_container_resource_requests" && Label("namespace") == "test"
- MetricName == "kube_pod_container_resource_limits" && Label("namespace") == "test"
# kubernetes-nodes-cadvisor
- MetricName == "container_cpu_usage_seconds" && Label("namespace") == "test"
- MetricName == "container_cpu_cfs_throttled_periods" && Label("namespace") == "test"
- MetricName == "container_cpu_cfs_periods" && Label("namespace") == "test"
- MetricName == "container_memory_working_set_bytes" && Label("namespace") == "test"
# Istio, source https://istio.io/latest/docs/reference/config/metrics/#metrics
- MetricName == "istio_requests" && Label("namespace") == "test"
- MetricName == "istio_request_duration_milliseconds" && Label("namespace") == "test"
- MetricName == "istio_request_bytes" && Label("namespace") == "test"
- MetricName == "istio_response_bytes" && Label("namespace") == "test"
# prometheus-node-exporter
- MetricName == "node_cpu_seconds"
- MetricName == "node_memory_MemTotal_bytes"
- MetricName == "node_memory_MemFree_bytes"
- MetricName == "node_memory_Buffers_bytes"
- MetricName == "node_memory_Cached_bytes"
so basically here I'm allow only specific set of metrics based on conditions.
It's seems like this case not possible with OTTL as it works like exclude
filter.
But what is needed it's an include
functionality, please can you provide an example how to convert example above to the OTTL?
@Hronom the fastest way to convert your config into OTTL syntax would be to take advantage of the not
operator. I believe the following config is exactly equivalent to your config:
filter:
error_mode: ignore
metrics:
metric:
- |
not (
name == "kube_pod_container_resource_requests" and HasAttrOnDatapoint("namespace", "test") or
name == "kube_pod_container_resource_limits" and HasAttrOnDatapoint("namespace", "test") or
name == "container_cpu_usage_seconds" and HasAttrOnDatapoint("namespace", "test") or
name == "container_cpu_cfs_throttled_periods" and HasAttrOnDatapoint("namespace", "test") or
name == "container_cpu_cfs_periods" and HasAttrOnDatapoint("namespace", "test") or
name == "container_memory_working_set_bytes" and HasAttrOnDatapoint("namespace", "test") or
name == "istio_requests" and HasAttrOnDatapoint("namespace", "test") or
name == "istio_request_duration_milliseconds" and HasAttrOnDatapoint("namespace", "test") or
name == "istio_request_bytes" and HasAttrOnDatapoint("namespace", "test") or
name == "istio_response_bytes" and HasAttrOnDatapoint("namespace", "test") or
name == "node_cpu_seconds" or
name == "node_memory_MemTotal_bytes" or
name == "node_memory_MemFree_bytes" or
name == "node_memory_Buffers_bytes" or
name == "node_memory_Cached_bytes"
)
I am pretty sure you could also do some boolean arithmetic if you don't want to use not
, but I think not
is the clearest approach.
@TylerHelmuth thanks a lot! Was didn't aware of not
Yes this is the most simple way and I think the most easy to understand.
I think it makes sense to add such example in README.md
.
The use case behind this conditions is next: when you use managed solutions for storing metrics(e.g. Datadog, AWS, Grafana Labs etc) in order to reduce the price - you need to filter out metrics that being send there.
I think this is pretty common use case.
And seems like filter process
is the most reasonable processor for doing such things.
filterspan, filtermetric, and filterlog are now all able to convert their configuration to OTTL syntax and execute equivalent boolean expressions. The next step is logging these statements so that users have an easier time updating their config.
It turns out formatting a friendly, config-ready log to help users recreate their config in OTTL syntax is difficult. This is the best log I have so far:
{"level":"info","msg":"You are using an outdated configuration format. Please use the traces.span configuration option with the attached statements.","statements":"not ((resource.attributes[\"service.name\"] == \"keep\" or resource.attributes[\"service.name\"] == \"also keep\") and (kind.deprecated_string == \"keep\" or kind.deprecated_string == \"also keep\")), (resource.attributes[\"service.name\"] == \"keep\" or resource.attributes[\"service.name\"] == \"also keep\") and (kind.deprecated_string == \"keep\" or kind.deprecated_string == \"also keep\")"}
Unless you look closely it isn't clear that there are 2 statements (separated by ,
) in the log. @mx-psi do you have idea good ideas on how to help move this configuration transition forward?
Something like open-telemetry/opentelemetry-collector/issues/7631 would definitely help here, but I haven't had time to look into it further and I am not planning to work on it at least for the next month.
I guess linking to a docs page with further info (could be an issue) would be great. That way we can keep an example there, and have a common place of discussion were people to find a problem with the configuration migration.
Lastly, having multiline logs is supported by some formats, so maybe statements
should have each statement separated by a newline?
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
Is your feature request related to a problem? Please describe.
With https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/16369, OTTL was added to the filterprocessor. This allows us to standardize the way the filterprocessor configures conditions for when data should be dropped.
For spans, the existing non-ottl configuration options are:
For metric, the existing non-ottl configuration options are:
For logs, the existing non-ottl configuration options are:
With the completion of https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/16413, OTTL can now handle each of these use cases, and more, in a uniform library.
In the past we only had what was in
internal/filter
and as different components needed different condition features those packages grew. As users wanted access to different fields in the telemetry payload the signal-specific filters would change to reflect those needs. The filterprocessor grew with it, providing important value but with increased maintenance burden and configuration complexity. Reading through the filterprocessor's readme highlights the difficulty in maintaining and understanding the different features and fields available.By unifying on OTTL we'll have a solution that has access to all fields on all signals. Users no longer need to worry about whether or not a field for their signal is available to use and maintainers no longer need to worry about adding more fields to filter on in the future (OTLP changes excluded). Due to OTTL's functions, adding more features to enable complex conditions is simpler as the functions encapsulate the logic and can be added without modifying the underlying libraries or configuration. On top of its field access and functions, OTTL's grammar also provides more robust conditions, allowing users to use inequalities,
nil
, and arithmetic.Steps to completion