rossmcdonald / telegraf

Ansible role for installing, configuring, and maintaining Telegraf
138 stars 85 forks source link

Issues using this role with Telegraf v1.11.0 #49

Open wdschei opened 5 years ago

wdschei commented 5 years ago

My team uses this Ansible Role as part of our performance test suite and we started failing as soon as Telegraf v1.11.0 was released.

We have started the investigation as to what is the root cause, but our short term fix has been to set telegraf_install_url to the previous version.

Is anyone else seeing anything like this:

fatal: [perf-use-cases-chunkedtransferencoding-repose-test01]: FAILED! => {
    "changed": true,
    "cmd": ["service", "telegraf", "status"],
    "start": "2019-06-18 11:33:50.597645",
    "end": "2019-06-18 11:33:50.607863",
    "delta": "0:00:00.010218",
    "failed": true,
    "rc": 3,
    "stderr_lines": [],
    "stdout_lines": [
        "● telegraf.service - The plugin-driven server agent for reporting metrics into InfluxDB",
        "   Loaded: loaded (/lib/systemd/system/telegraf.service; enabled; vendor preset: enabled)",
        "   Active: inactive (dead) (Result: exit-code) since Tue 2019-06-18 11:33:45 UTC; 5s ago",
        "     Docs: https://github.com/influxdata/telegraf",
        "  Process: 22531 ExecStart=/usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d $TELEGRAF_OPTS (code=exited, status=1/FAILURE)",
        " Main PID: 22531 (code=exited, status=1/FAILURE)",
        "",
        "Jun 18 11:33:45 perf-use-cases-chunkedtransferencoding-repose-test01 systemd[1]: telegraf.service: Unit entered failed state.",
        "Jun 18 11:33:45 perf-use-cases-chunkedtransferencoding-repose-test01 systemd[1]: telegraf.service: Failed with result 'exit-code'.",
        "Jun 18 11:33:45 perf-use-cases-chunkedtransferencoding-repose-test01 systemd[1]: telegraf.service: Service hold-off time over, scheduling restart.",
        "Jun 18 11:33:45 perf-use-cases-chunkedtransferencoding-repose-test01 systemd[1]: Stopped The plugin-driven server agent for reporting metrics into InfluxDB.",
        "Jun 18 11:33:45 perf-use-cases-chunkedtransferencoding-repose-test01 systemd[1]: telegraf.service: Start request repeated too quickly.",
        "Jun 18 11:33:45 perf-use-cases-chunkedtransferencoding-repose-test01 systemd[1]: Failed to start The plugin-driven server agent for reporting metrics into InfluxDB."
    ]
}
baschny commented 5 years ago

I came to the same problem today, the hint was in the logs:

Jun 24 16:17:10 xxx telegraf[19525]: 2019-06-24T14:17:10Z I! Starting Telegraf 1.11.0
Jun 24 16:17:10 xxx telegraf[19525]: 2019-06-24T14:17:10Z E! [telegraf] Error running agent: Error parsing /etc/telegraf/telegraf.conf, line 110: field corresponding to `mount_points' in disk.DiskStats cannot be set through TOML

The error is an regression from 1.11.0: https://github.com/influxdata/telegraf/pull/5982

It has been solved in telegraf code already, just not released yet.