influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.63k stars 5.58k forks source link

TOML parsed wrong if options in example configuration are enabled / Option ordering in example configuration is wrong #13781

Closed UweSauter closed 1 year ago

UweSauter commented 1 year ago

Relevant telegraf.conf

[global_tags]
[agent]
  interval = "60s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "300s"
  flush_jitter = "0s"
  precision = "0s"
  log_with_timezone = "Europe/Berlin"
  hostname = ""
  omit_hostname = false
[[outputs.kafka]]
  brokers = ["admin:9092"]
  topic = "telegraf"
  [outputs.kafka.topic_suffix]
    method = "measurement"
    separator = "_"
  data_format = "json"
[[inputs.cpu]]
  percpu = true
  totalcpu = true
  collect_cpu_time = false
  report_active = false
  core_tags = false
[[inputs.mem]]

Logs from Telegraf

Logs not relevant as the read configuration is not shown.

System info

Telegraf v1.27.3, Rocky Linux 8.8

Docker

No response

Steps to reproduce

  1. Use above configuration.
  2. Watch the data format that is sent to Kafka, e.g. by using tcpdump -i $INTERFACE -Annvv tcp dst port 9092
  3. See that the InfluxDB line protocol is used.

Expected behavior

Instead of the default data format (InfluxDB line protocol) the configured data format should be used.

Actual behavior

The default data format is used. The configured data format is ignored.

Additional info

As far as I can get a grasp of the issue it simply is a question of ordering the options in the configuration file.

The configuration file is using the TOML file format. That means that the file is segmented where each section starts with a [section name] headline. (This statement is a bit simplified but enough to show my point.) Every key-value pair following the headline will belong to that section. Leading whitespaces are ignored.

Take a look at the outputs.kafka section of the configuration file created by telegraf config. The only changes made here are the section has been enabled, comments have been stripped and data_format has been set.

[[outputs.kafka]]
  brokers = ["localhost:9092"]
  topic = "telegraf"
  # topic_tag = ""
  # exclude_topic_tag = false
  # client_id = "Telegraf"
  # version = ""

  # [outputs.kafka.topic_suffix]
  #   method = "measurement"
  #   separator = "_"

  # [outputs.kafka.topic_suffix]
  #   method = "tags"
  #   keys = ["foo"]
  #   separator = "__"

  # [outputs.kafka.topic_suffix]
  #   method = "tags"
  #   keys = ["foo", "bar"]
  #   separator = "_"

  routing_tag = "host"
  # routing_key = ""
  # compression_codec = 0
  # idempotent_writes = false
  # required_acks = -1
  # max_retry = 3
  # max_message_bytes = 1000000
  # enable_tls = false
  # tls_ca = "/etc/telegraf/ca.pem"
  # tls_cert = "/etc/telegraf/cert.pem"
  # tls_key = "/etc/telegraf/key.pem"
  # insecure_skip_verify = false
  # keep_alive_period = "15s"
  # socks5_enabled = true
  # socks5_address = "127.0.0.1:1080"
  # socks5_username = "alice"
  # socks5_password = "pass123"
  # sasl_username = "kafka"
  # sasl_password = "secret"
  # sasl_mechanism = ""
  # sasl_gssapi_service_name = ""
  # sasl_gssapi_auth_type = "KRB5_USER_AUTH"
  # sasl_gssapi_kerberos_config_path = "/"
  # sasl_gssapi_realm = "realm"
  # sasl_gssapi_key_tab_path = ""
  # sasl_gssapi_disable_pafxfast = false
  # sasl_access_token = ""
  # sasl_version = 1
  # metadata_full = false
  data_format = "json"

If this section is used as is then data_format belongs to [[outputs.kafka]]. But if one of the sub-sections [outputs.kafka.topic_suffix] is used, then data_format (really everything below the sub-section headline) belongs to the sub-section.

This causes problems because settings configured in the file are not honored.

The same applies to almost every other plugin section in the configuration file generated by Telegraf.

In order to resolve the issue Telegraf / the plugins would need to re-order the configuration sub-sections of plugin configurations so that settings belonging to the plugin section appear above of the sub-sections.

(My current work-around is to put data_format right after brokers but I think that an example configuration should not need this.)

powersj commented 1 year ago

As far as I can get a grasp of the issue it simply is a question of ordering the options in the configuration file.

This is correct. You put a kafka configuration option under the topic_suffix section.

(My current work-around is to put data_format right after brokers but I think that an example configuration should not need this.)

Not a workaround, this is the correct thing to do. There are times where using the inline table syntax can also help and avoid these types of situations.

The same applies to almost every other plugin section in the configuration file generated by Telegraf.

Can you please point out which other plugins? I think saying almost every other plugin is a little extreme. I went through all 60 outputs and only 4 of them had tables that were not already at the bottom of the plugin documentation:

The others that had tables were already at the bottom of a config:

UweSauter commented 1 year ago

I admit that I didn't check every plugin but got the impression from greping the file.

I now took the time to go manually through the configuration file as created by telegraf config. The following lines might need attention although I'm not proficient enough to really assess this.

[[outputs.http]] line 1216 ff.

#   ## Additional HTTP headers
#   # [outputs.http.headers]
#   #   # Should be set manually to "application/json" for json data_format
#   #   Content-Type = "text/plain; charset=utf-8"
#
#   ## MaxIdleConns controls the maximum number of idle (keep-alive)
#   ## connections across all hosts. Zero means no limit.
#   # max_idle_conn = 0

[[outputs.kafka]] line1440 ff.

#   ## Suffix equals to "_" + measurement name
#   # [outputs.kafka.topic_suffix]
#   #   method = "measurement"
#   #   separator = "_"
#
#   ## Suffix equals to "__" + measurement's "foo" tag value.
#   ##   If there's no such a tag, suffix equals to an empty string
#   # [outputs.kafka.topic_suffix]
#   #   method = "tags"
#   #   keys = ["foo"]
#   #   separator = "__"
#
#   ## Suffix equals to "_" + measurement's "foo" and "bar"
#   ##   tag values, separated by "_". If there is no such tags,
#   ##   their values treated as empty strings.
#   # [outputs.kafka.topic_suffix]
#   #   method = "tags"
#   #   keys = ["foo", "bar"]
#   #   separator = "_"

[[outputs.kinesis]] line 1600 ff.

#   ## The partition key can be calculated using one of several methods:
#   ##
#   ## Use a static value for all writes:
#   #  [outputs.kinesis.partition]
#   #    method = "static"
#   #    key = "howdy"
#   #
#   ## Use a random partition key on each write:
#   #  [outputs.kinesis.partition]
#   #    method = "random"
#   #
#   ## Use the measurement name as the partition key:
#   #  [outputs.kinesis.partition]
#   #    method = "measurement"
#   #
#   ## Use the value of a tag for all writes, if the tag is not set the empty
#   ## default option will be used. When no default, defaults to "telegraf"
#   #  [outputs.kinesis.partition]
#   #    method = "tag"
#   #    key = "host"
#   #    default = "mykey"

[[outputs.sql]] line 2373 ff.

#   ## Metric type to SQL type conversion
#   ## The values on the left are the data types Telegraf has and the values on
#   ## the right are the data types Telegraf will use when sending to a database.
#   ##
#   ## The database values used must be data types the destination database
#   ## understands. It is up to the user to ensure that the selected data type is
#   ## available in the database they are using. Refer to your database
#   ## documentation for what data types are available and supported.
#   #[outputs.sql.convert]
#   #  integer              = "INT"
#   #  real                 = "DOUBLE"
#   #  text                 = "TEXT"
#   #  timestamp            = "TIMESTAMP"
#   #  defaultvalue         = "TEXT"
#   #  unsigned             = "UNSIGNED"
#   #  bool                 = "BOOL

The indentation level of configuration options below [[inputs.aliyuncms.metrics]] line 8918 suggests that the options don't belong to that sub-section. (Nit-picking, I know. This might also be the case elsewhere but I noticed first below line 8918.)

One other thing I noticed: the [[inputs.win_perf_counters]] section starting at line 8558 has not only one but two columns of # to deactivate. Is this wanted?

Thanks.

powersj commented 1 year ago

Thanks for finding those. I'll put up a PR with them moved down or switched to in-line tables.

The indentation level of configuration options below [[inputs.aliyuncms.metrics]] line 8918 suggests that the options don't belong to that sub-section

I'll look at this as well

One other thing I noticed: the [[inputs.win_perf_counters]] section starting at line 8558 has not only one but two columns of # to deactivate. Is this wanted?

I have seen this before and wanted to change it but haven't gotten around to it. I can take a look.

edit: I should note that there are probably some places in processors and inputs where that note could also get added. I'm happy to add it to those as well, but I would like to get this reviewed for all the outputs first and landed and then we can go through the additional plugins.

powersj commented 1 year ago

@UweSauter

I have posted #13792 which first and foremost adds a note to the plugins you called out about tables:

 ## NOTE: Due to the way TOML is parsed, tables must be at the END of the
 ## plugin definition, otherwise additional config options are read as part of
 ## the table

I've also moved the tables for these plugins to the end of the sample configurations.

I took a look at inputs.aliyuncms and I agree that the indention was not correct, fixed up.

I also re-formatted the entire inputs.win_perf_counters to format better.

UweSauter commented 1 year ago

That note is a very good hint and would have saved me some time debugging this.

Great work, thanks.