open-telemetry / opentelemetry-collector

OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
4.41k stars 1.46k forks source link

Extend file confmap.Provider to support globs #7177

Open tigrannajaryan opened 1 year ago

tigrannajaryan commented 1 year ago

Is your feature request related to a problem? Please describe.

It is difficult to decompose Collector config and place them in separate files, one per component. For complex configs that use dozens of components which results in very large Collector config or requires multiple ${file} directives to include them, which is also difficult to manage since it requires changes in 2 places when adding new components.

Describe the solution you'd like

I would like to be able to use glob patterns for file URIs. When combined with the newly proposed receivergroup component I can then for example do this:

receivers:
  # This will read all *.yaml files from all sub-directories of the "receivers" directory and will merge them into one config
  # and then set the result the config of the receivergroup/all component.
  receivergroup/all: ${file:receivers/*/*.yaml}

service:
  pipelines:
    metrics:
      receivers: receivergroup/all
      exporters: ...

The suggestion is that when there are multiple files matching the glob pattern, the content of all those files is merged using the same merging rules logic we already use for confmap.Resolver.

This example allows placing receiver configs each in its own file and when you need to add a new receiver you just create a new yaml file in receivers directory.

bogdandrutu commented 1 year ago

Few questions, I thought about this:

This will read all *.yaml files from all sub-directories of the "receivers" directory and will merge them into one config and then set the result the config of the receivergroup/all component.

What is the order since configs from one file may overwrite other configs from a different file? Should it fail if any overlap happens?

I understand the "equivalence" with fluentd like systems, but probably we can have a different provider for this use-case? That maybe does something like:

receivers: ${**files/groups/other_good_name**:receivers/*/*.yaml}

And this provider reads all the file as a "map".

tigrannajaryan commented 1 year ago

What is the order since configs from one file may overwrite other configs from a different file? Should it fail if any overlap happens?

We can make it alphabetical order. I think we should override when overlap happens (not fail). This should allow deep merging of settings. For example I can have 2 files both declaring their own pipelines and they should be merged even though there is overlap in the key name (both included files start with pipelines key):

# file1.yaml
pipelines:
  logs/one: ...
  metrics:
    receivers: [otlp]
# file2.yaml
pipelines:
  logs/two: ...
  metrics:
    receivers: [jaeger]
# main config file
service: ${file:file*.yaml}

This should result in merged config:

service:
  pipelines:
    logs/one: ...
    logs/two: ...
    metrics:
      receivers: [jaeger] # note: file2.yaml wins here because it is lexicographically after file1.yaml.

Open question: should we try to merge sequences (e.g. by concatenating) or fully override? I think we should fully override like I have in the example above for service.pipelines.metrics.receivers key.

I understand the "equivalence" with fluentd like systems, but probably we can have a different provider for this use-case? That maybe does something like:

receivers: ${**files/groups/other_good_name**:receivers/*/*.yaml}

And this provider reads all the file as a "map".

Yes, this is also a possible option.