influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.4k stars 5.54k forks source link

Telegraf service with S7Comm plugin will stopped after started 10s if the PLC is not available #15609

Closed GitTurboy closed 3 weeks ago

GitTurboy commented 1 month ago

Relevant telegraf.conf

# Plugin for retrieving data from Siemens PLCs via the S7 protocol (RFC1006)
[[inputs.s7comm]]
  ## Parameters to contact the PLC (mandatory)
  ## The server is in the <host>[:port] format where the port defaults to 102
  ## if not explicitly specified.
  server = "10.100.35.1:102"
  rack = 0
  slot = 1 

  pdu_size = 10 #462

  ## Timeout for requests
   timeout = "5s"

Logs from Telegraf

2024-07-09T03:14:19Z I! Starting Telegraf 1.30.0 brought to you by InfluxData the makers of InfluxDB
2024-07-09T03:14:19Z I! Available plugins: 233 inputs, 9 aggregators, 31 processors, 24 parsers, 60 outputs, 5 secret-stores
2024-07-09T03:14:19Z I! Loaded inputs: s7comm
2024-07-09T03:14:19Z I! Loaded aggregators: 
2024-07-09T03:14:19Z I! Loaded processors: 
2024-07-09T03:14:19Z I! Loaded secretstores: 
2024-07-09T03:14:19Z I! Loaded outputs: influxdb_v2
2024-07-09T03:14:19Z I! Tags enabled: 
2024-07-09T03:14:19Z I! [agent] Config: Interval:5s, Quiet:false, Hostname:"", Flush Interval:2s
2024-07-09T03:14:19Z D! [agent] Initializing plugins
2024-07-09T03:14:19Z D! [agent] Connecting outputs
2024-07-09T03:14:19Z D! [agent] Attempting connection to [outputs.influxdb_v2]
2024-07-09T03:14:19Z D! [agent] Successfully connected to outputs.influxdb_v2
2024-07-09T03:14:19Z D! [agent] Starting service inputs
2024-07-09T03:14:19Z D! [inputs.s7comm] Connecting to "10.100.35.1:102"...

System info

Telegraf 1.31.1

Docker

No response

Steps to reproduce

  1. config a PLC which can not connected from the server telegraf hosted(OS windows 10)
  2. start the telegraf service
  3. view the service status and the log ...

Expected behavior

the service should keep run and try to reconnect to PLC periody

Actual behavior

the service stoped

Additional info

no

GitTurboy commented 1 month ago

The host is windows 10

powersj commented 1 month ago

Hi,

Please enable debug_connection = true in your s7comm config and provide the complete logs. I would like to see the full set of attempts made and the final shutdown, not just the first debug connection message.

In general, telegraf will fail to start up if it fails to connect to a device. This is the expected behavior as it makes it very clear that something is wrong to the user. It could be a bad password, connection, etc. We have added some connection error retry logic to some plugins and could possibly add this here, but I would like to see a complete set of logs first.

Thanks

GitTurboy commented 1 month ago

Hi,

Please enable debug_connection = true in your s7comm config and provide the complete logs. I would like to see the full set of attempts made and the final shutdown, not just the first debug connection message.

In general, telegraf will fail to start up if it fails to connect to a device. This is the expected behavior as it makes it very clear that something is wrong to the user. It could be a bad password, connection, etc. We have added some connection error retry logic to some plugins and could possibly add this here, but I would like to see a complete set of logs first.

Thanks

you are welcome! after setting debug_connection = true I got the logs below:

2024-07-10T00:42:09Z I! Starting Telegraf 1.31.1 brought to you by InfluxData the makers of InfluxDB 2024-07-10T00:42:09Z I! Available plugins: 234 inputs, 9 aggregators, 32 processors, 26 parsers, 60 outputs, 5 secret-stores 2024-07-10T00:42:09Z I! Loaded inputs: s7comm 2024-07-10T00:42:09Z I! Loaded aggregators: 2024-07-10T00:42:09Z I! Loaded processors: 2024-07-10T00:42:09Z I! Loaded secretstores: 2024-07-10T00:42:09Z I! Loaded outputs: influxdb_v2 2024-07-10T00:42:09Z I! Tags enabled: 2024-07-10T00:42:09Z I! [agent] Config: Interval:5s, Quiet:false, Hostname:"", Flush Interval:2s 2024-07-10T00:42:09Z D! [agent] Initializing plugins 2024-07-10T00:42:09Z D! [agent] Connecting outputs 2024-07-10T00:42:09Z D! [agent] Attempting connection to [outputs.influxdb_v2] 2024-07-10T00:42:09Z D! [agent] Successfully connected to outputs.influxdb_v2 2024-07-10T00:42:09Z D! [agent] Starting service inputs 2024-07-10T00:42:09Z D! [inputs.s7comm] Connecting to "10.100.35.1:102"...

Looks like the program crashed at line 156 and not logging more information

GitTurboy commented 1 month ago

connection error retry logic to some plugins

I think connection error retry logic for this plugins is very useful given our plant network situation, Thanks

srebhan commented 1 month ago

@GitTurboy just for my understanding, do you see an error message in the log or does it stop with D! [inputs.s7comm] Connecting to ...? In my tests I always see an error that the connection failed...

GitTurboy commented 1 month ago

@GitTurboy just for my understanding, do you see an error message in the log or does it stop with D! [inputs.s7comm] Connecting to ...? In my tests I always see an error that the connection failed...

if run as a service , the log were just same as what I posted. 【and the service will stoped !】 if run as console. app, the log contains:

2024-07-10T07:42:44Z W! Outputs are not used in testing mode! 2024-07-10T07:42:44Z I! Tags enabled: 2024-07-10T07:42:44Z D! [agent] Initializing plugins 2024-07-10T07:42:44Z D! [agent] Starting service inputs 2024-07-10T07:42:44Z D! [inputs.s7comm] Connecting to "10.100.35.1:102"... 2024-07-10T07:42:49Z E! [agent] Starting input inputs.s7comm: connecting to "10.100.35.1:102" failed: dial tcp 10.100.35.1:102: i/o timeout 2024-07-10T07:42:49Z D! [agent] Stopping service inputs 2024-07-10T07:42:49Z D! [agent] Input channel closed 2024-07-10T07:42:49Z D! [agent] Stopped Successfully

srebhan commented 3 weeks ago

@GitTurboy please test the binary in PR #15655, available as soon as CI finished the tests, and let me know if this fixes the issue! You should set startup_error_behavior = "retry" for the plugin to make the plugin retrying to connect in every gather cycle without failing.

GitTurboy commented 3 weeks ago

startup_error_behavior = "retry"

I have tried and All looks well. After observer several days, I will close this problem. Thank you!

srebhan commented 3 weeks ago

@GitTurboy please don't close the issue, it will automatically be closed as soon as we do merge the corresponding PR! Anyway, please let me know in here how your tests went, even though the issue might be closed already! In case of any problem feel free to reopen the issue!