open-telemetry / opentelemetry-operator

Kubernetes Operator for OpenTelemetry Collector
Apache License 2.0
1.18k stars 420 forks source link

Proposal: allow setting default configurations for all the OpenTelemetryCollector instances #2942

Open iblancasa opened 4 months ago

iblancasa commented 4 months ago

Component(s)

collector

Describe the issue you're reporting

The problem

The OpenTelemetry Operator does not allow vendors to set default configurations for the OpenTelemetry Collector in a simple way. Therefore, human-generated mistakes in configurations may increase, as well as low quality of data collection.

This proposal is directed at bettering the OpenTelemetry Operator for Kubernetes with the introduction of an approach for letting vendors specify default values for OpenTelemetry Collector configurations. This characteristic is going to ease out the process of deployment, guarantee that telemetry data collection is done according to best practices as well as avail default optimized settings to vendors.

Proposed solution

The way of providing the default configuration could be done, for instance, as a parameter for the operator CLI. In the future, we can think about integrating this with the Distributed collector configuration proposal.

--default-collector-config                          Default configuration for OpenTelemetry Collectors

That file can contain some configurations like enabling some processors by default:

      batch:
      memory_limiter:
        check_interval: 1s
        limit_percentage: 50
        spike_limit_percentage: 30

We can merge the configuration at the operator level (default-provided configuration + user-provided configuration). After that, we provide the configuration as it is today to the OpenTelemtry Collector (providing a file with the --config parameter).

How about updates? It is important to avoid changing the configuration of the running OpenTelemetryCollector instances in the cluster. So, when we upgrade from a version without this feature to when some default value was applied, we can use the upgrade routines to add some annotation to the running OpenTelemetryCollector instances to skip the merge of the configuration. This mechanism can be used for those OpenTelemetryCollector instances where users don't want to apply the default config.

For instance, the annotation can be opentelemetry-collector-skip-default: true.

jaronoff97 commented 4 months ago

Discussed at SIG: Decision is a yes! We will allow users to mount a configmap and then supply a path to the file for the mount. The operator will unmarshal this as a Config object as we do with the collector. At this point we will just merge with override.