influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.66k stars 5.59k forks source link

New plugin type: discovery plugins #13569

Open redbaron opened 1 year ago

redbaron commented 1 year ago

Use Case

When it is required to have certain Input plugin for each X of something currently it has to be done in the configuration management tool. This can be too limiting when running in a more dynamic environment such as clouds or kubernetes.

For instance there is a postgres Input plugin which requires database connection string. On the AWS cloud RDS databases can come and go and if task is to monitor all of them it is not straightforward how to achieve it.

Expected behavior

Config option to instantiate plugins dynamically based on the output of some discovery process.

Actual behavior

-

Additional info

I didn't give much thought how exactly to achieve it, but as a rough idea:

Every discovery plugin returns equivalent of JSON array of maps. Each map contains configuration values to be used by plugins

[[discovery.exec]]
alias = "example"
# simple discovery plugin which just prints discovery result.
command = ["/bin/sh", "-c", "echo '[{host: \"xyz\", interface: \"eth0\"}, {host: \"abc\", interface: \"eth1\"}]'"]

Then plugins can be wired to the discovery plugin. When wired that can use map values from discovery:

[[input.ping]]
discovery = discovery.exec.example   # identifier of the discovery plugin used to instantiate this plugin 
alias = "${discovery.host}"
urls = ["${discovery.host}"]
interface = "${discovery.interface}"

Should "spawn" 2 ping plugins dynamically.

Discovery plugin runs on the the interval as any other plugins and plugins it spawns ideally should stay up unless discovery plugin changes output.

redbaron commented 1 year ago

Hm, --config-directory + --watch-config enables this without any changes to Telegraf. Too bad watch-config doesn't discover new files, it watches only those present at startup, but it is not too big of a limitation

srebhan commented 1 year ago

@redbaron thanks for bringing this up. I'm thinking about "templating" (I think that's what it is) since quite some time but never got to it. My idea is to approach it from the configuration-source side... I.e. I was thinking about

[[configurationsources.sql]]
  # Template in golang-text-template style 
  template = "/etc/telegraf/my-templates/ping.tmpl"
  # Plugin specific settings to populate the template data
  dsn = "mysql://..."
  query = 'SELECT host,interface FROM all_hosts'
  # Refresh settings
  refresh = "10m"

  # Additional mappings for the template
  [configurationsources.sql.mapping]
    hostname = "host"

  # Default values to use for the template if not in the data
  [configurationsources.sql.defaults]
    interface = "eth0"
    type = "ICMP"

or your example

[[configurationsources.exec]]
  template = "/etc/telegraf/my-templates/ping.tmpl"
  command = ["/bin/sh", "-c", "echo '[{host: \"xyz\", interface: \"eth0\"}, {host: \"abc\", interface: \"eth1\"}]'"]
  format = "json"

Telegraf would then use a template engine (TBD but https://pkg.go.dev/text/template might be an option) with the data filled by the plugin. The data should be a list with dictionary-like objects. For each of the list elements Telegraf will execute the template and thus create one or more "configuration-files". Those might be kept in memory...

This way we might also be able to connect to asset management software to get device information...

Does that make sense?

redbaron commented 1 year ago

Glad that I am not the only one wanting this :)

I proposed pull model, where plugins are defined and they "pull" values for themsleves from discovery plugin.

Your idea is push model, where configuration resources creates or "pushes" full plugin instances into the list of plugins.

Both achieve the same goal, but I have a slight preference of pull model for following reasons:

srebhan commented 1 year ago

@redbaron always glad to discuss progress. ;-) Let me respond to your comment...

pull model better represents final structure of telegraf configuration: all plugins which can potentially be created are listed explicitly in the config like it is now. Push model is more opaque where template file defines plugins and that template file is not part of telegraf configuration, for instance it is not watched by --watch-config

While I agree that your way is more concrete there are also severe drawbacks. Let me outline some of them

  1. With your configuration we need to touch every plugin to support templating!
  2. Implementing different configuration sources is especially hard as we will end up with a list of setting per configuration source. Checking for complete, concise and conflict-free options will be a nightmare in each plugin.
  3. Imagine the situation where you do have 100 exactly same hosts running the same 10 services only differing in the IP address. In my configuration you create one template representing one host with all services and one configurationsources instance to fill in the IPs... In your scheme you cannot easily see this "all hosts are the same" property.

Regarding the template watching we can certainly incorporate also watching the templates. But that is a good point I was not aware of.

in the pull model discovery plugin has narrower scope: all it has to do is to return a simple datastructure and telegraf itself is responsible for managing plugin instances wired to it. With push model configurationresources plugin is responsible for generating full configuration, not just fetching values from external sources for it. Sure there can be utility functions to help with code reuse, but conceptually push plugins have to do more.

I disagree that the push model has to do more. Please remember that now each configured plugin instance is managed by the agent. That is, the agent holds a configuration and the corresponding list of plugins (for each category). The agent then uses this list to trigger gathering, organize the data-flow etc. This being said, my model has to

  1. generate the plugin instance configurations from the template and data delivered by the data-source
  2. notify the agent about config changes

The agent then will compare the new configuration with the currently running one and stops plugins no longer present, start new ones or restart modified plugin instances. I agree that for item 1 the configurationsources type of plugin has more work but my idea is that each of those sources generates a "configuration" data-structure like []map[string]interface{} where each element in the slice corresponds to a plugin instance with the map representing the settings. Telegraf can then provide a framework/function using the above structure and the template to generate the configuration. All this can be done without touching a plugin, it is a nice separation of concerns, and boiling down the task of the configurationsources plugins to filling the []map[string]interface{} structure.

In your model, the whole process is much more complicated. The plugins now need to be touched (see my comment above). Each of the plugins now needs to inform the agent about new, removed or modified instances. Currently, a plugin knows nothing about the agent nor about it's own identity nor about it's own setup state (initialized?, connected?, etc). We would need to develop ways a plugin can inform the agent etc and this is a massive change in Telegraf's architecture!

pull model configuration syntax remains TOML and looks and feels similar to existing configurations, users wont need to learn anything new

Sorry man but this is NOT an argument. :-) Do you really think this has a steep learning curve

[[inputs.ping]]
  urls = {{ URLS }}

even if we need some more sophisticated functionality like urls = {{ URLS | toList }} or similar the effort required is minimal... As an advantage you get the ability to do more complex configuration with complex structures for free like for Modbus devices... In your model you would need to code all possible complexities in the plugin...

I still have scars from Kubernete Helm where structured documents (YAMLs) are templated as text with Go template. Templating engine has no idea what it is producing, it doesn't manipulate elements of a structure like list or maps, but just shovels bytes blindly and it is a source of many errors and friction to authoring templates. It won't be as bad with TOML, beause spaces are just decorative, but I still expect generating TOML with templating will be error prone.

That is a valid point indeed and I cannot deny that templating will always be a source of errors especially because you cannot immediately see what is generated. However, with providing a way to output and check the configuration I think this is can be mitigated.

The alternative would be to inject the configuration directly without a template but this has the drawback that you need to specify all non-default settings for all plugin instances... And it might buy you nothing because what if the plugin expects a string and you give it a list? Or the plugin wants a number but you provide a string?

Happy to get your thoughts on my points!

redbaron commented 1 year ago

OK, push model it is then. As I said, it was just a slight preference.