influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.6k stars 5.57k forks source link

Telegraf is running SetState() before Init(). #16037

Open ragheed-ism opened 1 week ago

ragheed-ism commented 1 week ago

Relevant telegraf.conf

[agent]
interval = "60s"
round_interval = true
metric_batch_size = 1000000
metric_buffer_limit = 1000000
collection_jitter = "0s"
flush_interval = "1s"
flush_jitter = "0s"
precision = ""
hostname = ""
omit_hostname = true
skip_processors_after_aggregators = true
debug = true
statefile = "/etc/telegraf/statefile1"

[[outputs.file]]
files = ["stdout"]

[[inputs.tail]]
files = ["/path/to/line/protocal/sample/text"]
data_format = "influx"

Logs from Telegraf

2024-10-17T08:05:58Z I! Loading config: telegraf2.conf
2024-10-17T08:05:58Z I! Starting Telegraf 1.32.1 brought to you by InfluxData the makers of InfluxDB
2024-10-17T08:05:58Z I! Available plugins: 235 inputs, 9 aggregators, 32 processors, 26 parsers, 62 outputs, 6 secret-stores
2024-10-17T08:05:58Z I! Loaded inputs: tail
2024-10-17T08:05:58Z I! Loaded aggregators:
2024-10-17T08:05:58Z I! Loaded processors:
2024-10-17T08:05:58Z I! Loaded secretstores:
2024-10-17T08:05:58Z I! Loaded outputs: file
2024-10-17T08:05:58Z I! Tags enabled:
2024-10-17T08:05:58Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"", Flush Interval:1s
2024-10-17T08:05:58Z D! [agent] Initializing plugin states
2024-10-17T08:05:58Z D! [agent] Initializing plugins
2024-10-17T08:05:58Z D! [agent] Connecting outputs
2024-10-17T08:05:58Z D! [agent] Attempting connection to [outputs.file]
2024-10-17T08:05:58Z D! [agent] Successfully connected to outputs.file
2024-10-17T08:05:58Z D! [agent] Starting service inputs
2024-10-17T08:05:58Z D! [inputs.tail] Tail added for "/home/ragheed-ismail/tmp/input_line_protocol_2.txt"
2024-10-17T08:05:59Z D! [outputs.file] Buffer fullness: 0 / 1000000 metrics
2024-10-17T08:06:00Z D! [outputs.file] Buffer fullness: 0 / 1000000 metrics
^C2024-10-17T08:06:01Z D! [agent] Stopping service inputs
2024-10-17T08:06:01Z D! [inputs.tail] Recording offset 689 for "/home/ragheed-ismail/tmp/input_line_protocol_2.txt"
2024-10-17T08:06:01Z D! [inputs.tail] Tail removed for "/home/ragheed-ismail/tmp/input_line_protocol_2.txt"
2024-10-17T08:06:01Z D! [agent] Input channel closed
2024-10-17T08:06:01Z I! [agent] Hang on, flushing any cached metrics before shutdown
2024-10-17T08:06:01Z D! [outputs.file] Buffer fullness: 0 / 1000000 metrics
2024-10-17T08:06:01Z I! [agent] Stopping running outputs
2024-10-17T08:06:01Z D! [agent] Persisting plugin states
2024-10-17T08:06:01Z D! [agent] Stopped Successfully

System info

Telergaf >=1.31.0, Ubuntu 20.04

Docker

No response

Steps to reproduce

  1. Specify the path to a file containg any line protocal data in it.
  2. Run telegraf for a few seconds, and then stop it. This will create a statefile with the offset used by the tail plugin.
  3. Rerun telegraf again, this will cause the tail plugin to start with the last offset saved in statefile.

Expected behavior

Init() should be called before SetState() when telegraf starts.

Actual behavior

Init() is being called after SetState() when telegraf starts.

Additional info

I am creating a custom aggregator that needs state persistence. When testing it, i found out that the SetState() is being called before Init(), which is the opposite of what the docs have stated. The easiest way to recreate the issue without adding my plugin is using the inputs.tail plugin. I ran the test for different version of telegraf and found out that the issue started with telegraf version 1.31.0 and is still present in the latest version (1.32.1 at the time of writing this issue).

In order to make sure that this is not a visual bug, i add the following debugs at the start of Init() and SetState() for the tail plugin

t.Log.Debugf("This is Init(): '%v'",time.Now()) t.Log.Debugf("This is SetState: '%v'",time.Now())

and got the following results proving the issue

2024-10-17T07:14:29Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"", Flush Interval:1s 2024-10-17T07:14:29Z D! [agent] Initializing plugin states 2024-10-17T07:14:29Z D! [inputs.tail] This is SetState: '2024-10-17 10:14:29.417768546 +0300 EEST m=+0.118141866' 2024-10-17T07:14:29Z D! [agent] Initializing plugins 2024-10-17T07:14:29Z D! [inputs.tail] This is Init: '2024-10-17 10:14:29.417828948 +0300 EEST m=+0.118202268' 2024-10-17T07:14:29Z D! [agent] Connecting outputs 2024-10-17T07:14:29Z D! [agent] Attempting connection to [outputs.file] 2024-10-17T07:14:29Z D! [agent] Successfully connected to outputs.file

ragheed-ism commented 6 days ago

When checking the agent code (telegraf/agent/agent.go) between versions 1.30 and 1.31, i found out the in the func (a *Agent) Run(ctx context.Context) error part of the code, the intialization of plugins and states have been flipped. Is this intentional? If yes, then shouldn't the docs be updated?