Closed mhoyer closed 1 year ago
Additional info. I just tried to set the two config parameters again inside the outputs.influxdb_v2
settings which then really seem to apply as expected. Still, I wonder why it's not working when configured in the [agent]
section only.
Hi,
Config files are loaded one at a time, which means that top level [agent]
settings that affect plugins, like inputs or outputs, needs to be specified first.
Take the following example config files:
config.toml
[[outputs.file]]
alias = "first"
[[inputs.exec]]
commands = ["echo metric,tag=1 field=42"]
data_format = "influx"
zconfig.toml
[agent]
metric_buffer_limit = 100
debug = true
[[inputs.exec]]
commands = ["echo other,tag=1 field=42"]
data_format = "influx"
[[outputs.file]]
alias = "second"
2023-04-21T13:43:17Z I! Starting Telegraf 1.25.3
2023-04-21T13:43:17Z I! Available plugins: 228 inputs, 9 aggregators, 26 processors, 21 parsers, 57 outputs, 2 secret-stores
2023-04-21T13:43:17Z I! Loaded inputs: exec (2x)
2023-04-21T13:43:17Z I! Loaded aggregators:
2023-04-21T13:43:17Z I! Loaded processors:
2023-04-21T13:43:17Z I! Loaded secretstores:
2023-04-21T13:43:17Z I! Loaded outputs: file (2x)
2023-04-21T13:43:17Z I! Tags enabled: host=ryzen
2023-04-21T13:43:17Z D! [agent] Initializing plugins
2023-04-21T13:43:17Z D! [agent] Connecting outputs
2023-04-21T13:43:17Z D! [agent] Attempting connection to [outputs.file::first]
2023-04-21T13:43:17Z D! [agent] Successfully connected to outputs.file::first
2023-04-21T13:43:17Z D! [agent] Attempting connection to [outputs.file::second]
2023-04-21T13:43:17Z D! [agent] Successfully connected to outputs.file::second
2023-04-21T13:43:17Z D! [agent] Starting service inputs
2023-04-21T13:43:17Z D! [agent] Stopping service inputs
2023-04-21T13:43:17Z D! [agent] Input channel closed
2023-04-21T13:43:17Z I! [agent] Hang on, flushing any cached metrics before shutdown
metric,host=ryzen,tag=1 field=42 1682084598000000000
other,host=ryzen,tag=1 field=42 1682084598000000000
metric,host=ryzen,tag=1 field=42 1682084598000000000
other,host=ryzen,tag=1 field=42 1682084598000000000
2023-04-21T13:43:17Z D! [outputs.file::second] Wrote batch of 2 metrics in 15.2µs
2023-04-21T13:43:17Z D! [outputs.file::second] Buffer fullness: 0 / 100 metrics
2023-04-21T13:43:17Z D! [outputs.file::first] Wrote batch of 2 metrics in 13.1µs
2023-04-21T13:43:17Z D! [outputs.file::first] Buffer fullness: 0 / 10000 metrics
2023-04-21T13:43:17Z I! [agent] Stopping running outputs
2023-04-21T13:43:17Z D! [agent] Stopped Successfully
Note how the second file output gets the global agent metric size set to 100 as specified by the config, but the first does not. That is because when the first file is loaded, the agent settings are not known.
Also note that the above is with the version you said worked. I do see the same behavior all the way back to v1.19.0:
../telegraf-builds/telegraf-v1.19.0 --config config.toml --config zconfig.toml --once
2023-04-21T13:51:40Z I! Starting Telegraf 1.19.0
2023-04-21T13:51:40Z D! [agent] Initializing plugins
2023-04-21T13:51:40Z D! [agent] Connecting outputs
2023-04-21T13:51:40Z D! [agent] Attempting connection to [outputs.file::first]
2023-04-21T13:51:40Z D! [agent] Successfully connected to outputs.file::first
2023-04-21T13:51:40Z D! [agent] Attempting connection to [outputs.file::second]
2023-04-21T13:51:40Z D! [agent] Successfully connected to outputs.file::second
2023-04-21T13:51:40Z D! [agent] Starting service inputs
2023-04-21T13:51:40Z D! [agent] Stopping service inputs
2023-04-21T13:51:40Z D! [agent] Input channel closed
2023-04-21T13:51:40Z I! [agent] Hang on, flushing any cached metrics before shutdown
other,host=ryzen,tag=1 field=42 1682085101000000000
metric,host=ryzen,tag=1 field=42 1682085101000000000
2023-04-21T13:51:40Z D! [outputs.file::first] Wrote batch of 2 metrics in 17.94µs
2023-04-21T13:51:40Z D! [outputs.file::first] Buffer fullness: 0 / 10000 metrics
other,host=ryzen,tag=1 field=42 1682085101000000000
metric,host=ryzen,tag=1 field=42 1682085101000000000
2023-04-21T13:51:40Z D! [outputs.file::second] Wrote batch of 2 metrics in 4.47µs
2023-04-21T13:51:40Z D! [outputs.file::second] Buffer fullness: 0 / 100 metrics
2023-04-21T13:51:40Z I! [agent] Stopping running outputs
2023-04-21T13:51:40Z D! [agent] Stopped Successfully
I'm inclined to say this is working as expected for now, but I do think we can and should better document this limitation.
Thanks @powersj, I guess the sequence of when what config block is read was the correct hint. The missing piece (which led to my misbehavior) probably was the way how we start telegraf
in our k8s pod:
telegraf --config-directory /etc/telegraf
And this is now quite reproduceable for me with a small setup using docker-compose
:
version: "3"
services:
telegraf:
image: telegraf:1.26.1-alpine
# image: telegraf:1.25.3-alpine
container_name: telegraf
volumes:
- ./conf:/etc/telegraf
entrypoint: telegraf --config-directory /etc/telegraf
And besides the docker-compose.yaml
from above, just place those two files into a ./conf
folder:
inputs.conf
(same file as your config.toml
):
[[outputs.file]]
alias = "first"
[[inputs.exec]]
commands = ["echo metric,tag=1 field=42"]
data_format = "influx"
telegraf.conf
(same file as your zconfig.toml
):
[agent]
metric_buffer_limit = 100
debug = true
[[inputs.exec]]
commands = ["echo other,tag=1 field=42"]
data_format = "influx"
[[outputs.file]]
alias = "second"
Now if I run that docker-compose.yaml
with the different Telegraf versions I can reproduce it:
with 1.25.3:
telegraf | 2023-04-21T20:24:42Z I! Using config file: /etc/telegraf/telegraf.conf
telegraf | 2023-04-21T20:24:42Z I! Starting Telegraf 1.25.3
telegraf | 2023-04-21T20:24:42Z I! Available plugins: 228 inputs, 9 aggregators, 26 processors, 21 parsers, 57 outputs, 2 secret-stores
telegraf | 2023-04-21T20:24:42Z I! Loaded inputs: exec (3x)
telegraf | 2023-04-21T20:24:42Z I! Loaded aggregators:
telegraf | 2023-04-21T20:24:42Z I! Loaded processors:
telegraf | 2023-04-21T20:24:42Z I! Loaded secretstores:
telegraf | 2023-04-21T20:24:42Z I! Loaded outputs: file (3x)
telegraf | 2023-04-21T20:24:42Z I! Tags enabled: host=5eb112530210
telegraf | 2023-04-21T20:24:42Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"5eb112530210", Flush Interval:10s
telegraf | 2023-04-21T20:24:42Z D! [agent] Initializing plugins
telegraf | 2023-04-21T20:24:42Z D! [agent] Connecting outputs
telegraf | 2023-04-21T20:24:42Z D! [agent] Attempting connection to [outputs.file::second]
telegraf | 2023-04-21T20:24:42Z D! [agent] Successfully connected to outputs.file::second
telegraf | 2023-04-21T20:24:42Z D! [agent] Attempting connection to [outputs.file::first]
telegraf | 2023-04-21T20:24:42Z D! [agent] Successfully connected to outputs.file::first
telegraf | 2023-04-21T20:24:42Z D! [agent] Attempting connection to [outputs.file::second]
telegraf | 2023-04-21T20:24:42Z D! [agent] Successfully connected to outputs.file::second
telegraf | 2023-04-21T20:24:42Z D! [agent] Starting service inputs
telegraf | 2023-04-21T20:24:52Z D! [outputs.file::second] Wrote batch of 3 metrics in 76.2µs
telegraf | 2023-04-21T20:24:52Z D! [outputs.file::second] Buffer fullness: 0 / 100 metrics
telegraf | 2023-04-21T20:24:52Z D! [outputs.file::second] Wrote batch of 3 metrics in 13.6µs
telegraf | 2023-04-21T20:24:52Z D! [outputs.file::second] Buffer fullness: 0 / 100 metrics
telegraf | 2023-04-21T20:24:52Z D! [outputs.file::first] Wrote batch of 3 metrics in 13.4µs
telegraf | 2023-04-21T20:24:52Z D! [outputs.file::first] Buffer fullness: 0 / 100 metrics
with 1.26.1:
telegraf | 2023-04-21T20:25:45Z I! Loading config file: /etc/telegraf/inputs.conf
telegraf | 2023-04-21T20:25:45Z I! Loading config file: /etc/telegraf/telegraf.conf
telegraf | 2023-04-21T20:25:45Z I! Starting Telegraf 1.26.1
telegraf | 2023-04-21T20:25:45Z I! Available plugins: 235 inputs, 9 aggregators, 27 processors, 22 parsers, 57 outputs, 2 secret-stores
telegraf | 2023-04-21T20:25:45Z I! Loaded inputs: exec (2x)
telegraf | 2023-04-21T20:25:45Z I! Loaded aggregators:
telegraf | 2023-04-21T20:25:45Z I! Loaded processors:
telegraf | 2023-04-21T20:25:45Z I! Loaded secretstores:
telegraf | 2023-04-21T20:25:45Z I! Loaded outputs: file (2x)
telegraf | 2023-04-21T20:25:45Z I! Tags enabled: host=dd6d242d10d3
telegraf | 2023-04-21T20:25:45Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"dd6d242d10d3", Flush Interval:10s
telegraf | 2023-04-21T20:25:45Z D! [agent] Initializing plugins
telegraf | 2023-04-21T20:25:45Z D! [agent] Connecting outputs
telegraf | 2023-04-21T20:25:45Z D! [agent] Attempting connection to [outputs.file::first]
telegraf | 2023-04-21T20:25:45Z D! [agent] Successfully connected to outputs.file::first
telegraf | 2023-04-21T20:25:45Z D! [agent] Attempting connection to [outputs.file::second]
telegraf | 2023-04-21T20:25:45Z D! [agent] Successfully connected to outputs.file::second
telegraf | 2023-04-21T20:25:45Z D! [agent] Starting service inputs
telegraf | other,host=dd6d242d10d3,tag=1 field=42 1682108750000000000
telegraf | metric,host=dd6d242d10d3,tag=1 field=42 1682108750000000000
telegraf | other,host=dd6d242d10d3,tag=1 field=42 1682108750000000000
telegraf | metric,host=dd6d242d10d3,tag=1 field=42 1682108750000000000
telegraf | 2023-04-21T20:25:55Z D! [outputs.file::second] Wrote batch of 2 metrics in 70.9µs
telegraf | 2023-04-21T20:25:55Z D! [outputs.file::second] Buffer fullness: 0 / 100 metrics
telegraf | 2023-04-21T20:25:55Z D! [outputs.file::first] Wrote batch of 2 metrics in 16.5µs
telegraf | 2023-04-21T20:25:55Z D! [outputs.file::first] Buffer fullness: 0 / 10000 metrics
So to me it seems to be related to the use of --config-directory
option when starting Telegraf and a changed behavior about the way the config files are loaded (in which order).
Does this make sense to you?
telegraf --config-directory /etc/telegraf
I did not think this would work in v1.25.3. It was not until v1.26.0 that we only allowed a config directory to be passed in.
telegraf | 2023-04-21T20:24:42Z I! Using config file: /etc/telegraf/telegraf.conf telegraf | 2023-04-21T20:24:42Z I! Starting Telegraf 1.25.3 telegraf | 2023-04-21T20:25:45Z I! Loaded inputs: exec (2x) telegraf | 2023-04-21T20:25:45Z I! Loaded outputs: file (2x)
You only loaded one file in the 1.25.3 output which has 3x exec inputs and 2x file outputs. That does not align with your example configs.
It sounds more likely that you went from a single file -> config directory. And now you are seeing that the config directory loads file in their listed order. In which case your [agent]
setting is in a later file.
My suggestion is to put your [agent]
settings in a file and pass that first. Either call it agent.conf in your config dir or pass it as --config agent.toml
to ensure it is read first.
Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Page. Thank you!
Relevant telegraf.conf
Logs from Telegraf
System info
Telegraf 1.26.1, Alpine Linux v3.17, in Kubernetes cluster (TKGI
Docker
We host it inside a kubernetes cluster v1.23.16 as custom helm chart. If really needed I can (but have to create a sample) Dockerfile.
Steps to reproduce
metric_buffer_limit
andmetric_batch_size
in telegraf.confdebug = true
Expected behavior
Apply the
metric_buffer
related settings I configured in/etc/telegraf/telegraf.conf
Actual behavior
At least for those two configuration values the Telegraf instance is falling back to the default values:
metric_buffer_limit
metric_batch_size
Additional info
As you can see in the provided
telegraf.conf
above we have custom settings formetric_batch_size = 200
andmetric_buffer_limit = 500
, but in the logs we see the default values1000
for batch size and10000
for the buffer limit respectively.The problem (seeing the
### metrics have been dropped
warnings in the logs) suddenly started after updating Telegraf pods from 1.25.3 to 1.26.1 w/o any changes to the configuration files.I also opened a terminal inside the Telegraf Pod and double checked the following: