Closed yishaihl closed 4 years ago
It might have something to do with the mixup between my ESXi 6.0 and ESXi 6.7 versions?
So when Telegraf first starts up the plugin is working but then over time it stops reporting? How long does it take before metrics stop being reported?
related #6158
I think we need a logfile to be able to address this. Can you share, @yishaihl ?
@danielnelson it can take 4 hours before the metrics stops reporting or even 24h, not constant.
@prydin please see attached Grafana log: grafana.log
I would need the telegraf log, not the grafana one. If it stops collecting metrics, there should be an error on there somewhere.
@prydin Here you go: (mind that i just enabled the telegraf.log, but i think that the disconnections are there.) telegraf.log
(i saw this continuous error: Error in plugin: Post https://10.0.0.234/sdk: context deadline exceeded)
@prydin Following : https://github.com/influxdata/telegraf/issues/5133 i saw that the following line was hashed so i changed it as follow:
&
you think it might solve those disconnections?
Thanks.
I don't see any disconnections in the log, but I think I have an idea what could be wrong. It looks like the metric poll for vSphere returns well over 10,000 metrics, but your output buffer is only set to 10,000. This can cause Telegraf to silently drop data. To verify this, please enable the "internal" plugin and check the metrics dropped metric (don't remember the exact name of it).
Alternatively, increase the output buffer to at least 50,000 and see if it solves the problem. In other words, your metric_buffer_limit
should read like this:
metric_buffer_limit = 50000
I'll try and let you know if it worked.. thanks!
The advice above is good, I also recommend running the internal
plugin and increasing your metric buffer, but I wanted to mention that Telegraf 1.11 there is a warning log message whenever an output drops messages.
@yishaihl were you able to find a solution or identify the cause of this?
Relevant telegraf.conf:
System info:
OS version: Ubuntu 18.04 vCenter version: 6.7.0 - build 13639324 ESXi version: VMware ESXi, 6.0.0, 3620759 & VMware ESXi, 6.7.0, 13006603 Telegraf version: Telegraf 1.11.3 Grafana version: Grafana v6.2.5 (commit: 6082d19) . InfluxDB version: version 1.7.7. Dashboard: VMware vSphere Overview : https://grafana.com/grafana/dashboards/8159
Steps to reproduce:
Expected behavior:
Actual behavior:
I'm loosing Datastore dashboard statistics from time to time, and i need to reboot Telegraf service to solve it.
Additional info: