open-telemetry / opentelemetry-collector

OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
4.27k stars 1.41k forks source link

Support Loading Collector Configuration Files from a Directory #9596

Open woody1872 opened 7 months ago

woody1872 commented 7 months ago

Is your feature request related to a problem? Please describe.

A real challenge that I, and others, are having right now using the collector is when things start to scale and config starts to get bigger, and also when config needs to be dynamic to support certain roles/profiles etc like in Puppet or Ansible managed environments.

The collector is currently great if you have a single config.yaml file, but when the config gets larger it becomes hard to reason about, so the natural evolution is to start splitting the config in to several individual YAML files. This again works great when you have 2 or 3 files to load. Beyond that though, the collector is very cumbersome and frustrating to use. I tend to have anywhere between 5-15 config files, each with a specific "job", so they don't get too big, and so they all have a single responsibility. This means 5-15 --config=/big/long/path/to/otelcol/config/files/*.yaml which is not great.

As an example, a component I use a lot is the hostmetrics receiver. At the time of writing it has 9 scrapers, so for me it's immediately 9 different config files - because I typically want my config files to be separated by scraper, so I can quickly and easily see how each is being scraped, transformed/processed, exported etc. Each of these files may be named something like cpu.yaml, memory.yaml etc. Each of these files can individually get quite large. It's not uncommon to apply custom transformations, filters, resourcdetection, add or remove certain attributes, collect certain metrics within a scraper at different intervals to other metrics, send to multiple backends etc etc etc. So each file can get quite large. I've seen 600-700 line config (combined) before just to collect metrics in a certain way to meet business requirements.

But, in the my example above, it's actually 10 config files, because I always like to maintain a base.yaml or default.yaml file for stuff that is default/important/fundamental - stuff people should think twice about before touching. Examples include default batch and memory_limiter processors, default exporter configurations, health_check extension so certain external health monitors don't break, authentication extensions, and more.

This problem is difficult enough on it's own at smaller scale, but combine deployment on a large scale (thousands, or tens of thousands of VMs, containers etc), and with dynamic configuration requirements it becomes very challenging. By "dynamic configuration requirements" I mean groups of servers/applications/databases/message queues/clusters may need to have different configuration to others. It's very common to have "roles" and "profiles" that need to be applied to infrastructure, and the current limitation of --config=.... --config=... makes it quite challenging to handle. See the Puppet roles and profiles pattern as an example of a common way for people to manage infrastructure configuration.

Describe the solution you'd like

Currently, we only have the --config option. I would really like to see a --config-dir option as well, to make it drastically simpler to split up and apply certain configuration. With that option in place, you'd just need to drop all your configuration in a directory. The collector would then load all *.yaml or *.yml files in the given directory, as if you had passed in --config-/path/to/given/dir/a.yaml --config=/path/to/given/dir/b.yaml --config=.... This would be a considerable improvement for end-users, and a nice quality of life improvement for the collector.

Describe alternatives you've considered

Another option, which isn't quite the same, and wouldn't provide nearly the same level of simplicity, is to adapt the current --config option to accept wildcards (glob patterns). So you could then do --config=/path/to/config/dir/*.yaml instead of having to list each file individually. This would definitely be an improvement, but still not ideal.

Additional context

I've done a bit of research on some other metric and log collection agents to see how they handle the loading of configuration. This is kind of challenging because I haven't used too many of them, but hopefully the below is somewhat useful for comparison.

Telegraf: Can use the --config-directory option - the default configuration directory is /etc/telegraf/telegraf.d/

See here.

Datadog Agent: Can load config files from /etc/datadog-agent/conf.d/

See here.

The Dynatrace and and Instana agents appear to be support loading config from directories, I could see references to directory names and config files, but I've never used them and nothing in their docs was very explicit about it, only snippets and hints.

I looked at some other metric and log collection agents, and from what I can see, the ones that don't support loading config from a directory only seem to support reading from a single configuration file at a time - so even worse than not being able to use multiple config files. I'm not naming them here in order to avoid disparaging them, because it could be that I'm just not familiar enough them.

woody1872 commented 7 months ago

Two things I didn't want to add to the "main" proposal, but I think are worth discussion:

1 - I think --config-dir should be recursive, to provide maximum flexibility when structuring the configuration directory

2 - The current docs mention a default config file can be used called /etc/otelcol-contrib/config.yaml (assuming you're using the otelcol-contrib distro). I think it makes sense to also have a default configuration directory called /etc/otelcol-contrib/config.d/ OR /etc/otelcol-contrib/otelcol-contrib.d/ (again assuming otelcol-contrib).

TylerHelmuth commented 6 months ago

@woody1872 one problem to solve with a directory option is the order the config files are resolved.

woody1872 commented 6 months ago

I think lexicographical order would be sensible. It seems like it would be testable, repeatable, and reliable (thinking here about config files being modified not changing the resulting load order).

TylerHelmuth commented 6 months ago

As a feature I think this idea makes sense and has precedence from other agents. We are working on stabilizing the confmap module, and it is likely that this feature could be added in the future as a non-breaking addition, so it may not get immediate attention.

woody1872 commented 6 months ago

Makes sense - thanks Tyler. Should this issue stay open to track I guess?

TylerHelmuth commented 6 months ago

Yes, thank you!

sinkingpoint commented 6 months ago

I just hit this as well. I think it does make sense to bake this into confmap, either in the file uri handler, or a separate directory one

swiatekm commented 5 months ago

We've discussed this during the SIG meeting at 10.04.2024. The conclusions were: