Closed h49nakxs closed 8 months ago
Hi,
In general, if we cannot connect an output telegraf will not start. This is the expected behavior as it prevents scenarios where a user is using a wrong password or has otherwise incorrectly configured the output connection. We are happy to see PRs to allow per-plugin exceptions, disabled by default, where the plugin would continue to try to reconnect, usually during each write attempt.
Not sure if this should be fixed at the agent level or at the output plugin level.
We would be happy to see a PR at the plugin level. Having the Write()
function check to see if we are connected and reconnect or re-call the connection function to re open the SQL function would be acceptable. This feature would need to be around a new configuration option and disabled by default.
@h49nakxs can you please test PR #15065, available as-soon-as CI finished the tests, with startup_error_behavior = "retry"
and let me know if this fixes the issue!?!?
Relevant telegraf.conf
Logs from Telegraf
System info
Telegraf 1.26.2, Windows 10 Professionnal 21H2
Docker
No response
Steps to reproduce
Expected behavior
When telegraf is starting and the output host is not available yet, telegraf should retry the network connection at least every X seconds for X times.
Actual behavior
Telegraf service is started but no data is sent in the outputs.
Additional info
If the network is cut when telegraf service is already started, telegraf correctly sends data to the output as soon as the network is back.
This issue is very problematic for clients that takes a bit of time to get the network connection (eg : laptops connected to a wifi network only) because it makes telegraf unusable on those.
Not sure if this should be fixed at the agent level or at the output plugin level.