influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.59k stars 5.56k forks source link

mqtt_consumer stops gathering metrics #4647

Closed Dees7 closed 6 years ago

Dees7 commented 6 years ago

Relevant telegraf.conf:

 [global_tags]
 [agent]
  interval = "60s"
  round_interval = true
  metric_batch_size = 500
  metric_buffer_limit = 1000
  collection_jitter = "0s"
  flush_interval = "80s"
  flush_jitter = "0s"
  precision = ""
  debug = true
  quiet = false
  logfile = "/var/log/telegraf/telegraf.log"
  hostname = ""
  omit_hostname = false
 [[outputs.influxdb]]
  urls = ["http://127.0.0.1:8086"]
  database = "arduino"
  skip_database_creation = false
  retention_policy = ""
  write_consistency = "any"
  timeout = "5s"
  username = "telegraf"
  password = "inserthardtoreadpasswordhere"
 [[inputs.cpu]]
  percpu = true
  totalcpu = true
  collect_cpu_time = false
  report_active = false
 [[inputs.disk]]
  mount_points = ["/","/var/wal","/boot"]
  ignore_fs = ["devtmpfs", "devfs"]
 [[inputs.diskio]]
 [[inputs.kernel]]
 [[inputs.mem]]
 [[inputs.processes]]
 [[inputs.swap]]
 [[inputs.system]]
 [[inputs.net]]
  interfaces = ["wlan0","eth0"]
 [[inputs.procstat]]
  pattern = "rsyslogd|wpa_supplicant|bluetoothd|dhcpcd|influxd|grafana|sshd|mosquitto|SCREEN|bash|screen|telegraf"
  user = "pi|telegraf|grafana|influxdb|mosquitto"
 [[inputs.mqtt_consumer]]
  servers = ["tcp://localhost:1883"]
  qos = 0
  connection_timeout = "30s"
  persistent_session = true
  client_id = "telegraf"
  username = "wiot"
  password = "wiot"
  name_override = "mqttc"
  data_format = "json"

System info:

Telegraf unknown (git: master 3268937c) mosquitto version 1.4.10 (build date Fri, 22 Dec 2017 08:19:25 +0000) Description: Raspbian GNU/Linux 9.4 (stretch)

Steps to reproduce:

Unexpected stops gathering metrics

Debug log looks like

2018-09-07T05:16:20Z D! Output [influxdb] buffer fullness: 96 / 1000 metrics.                                                                                                                         
2018-09-07T05:16:20Z D! Output [influxdb] wrote batch of 96 metrics in 114.112502ms 

When stops gathering metrics telegraf pushes only diskio\mem\cpu\etc metric (no mqtt)

2018-08-07T04:23:00Z D! Output [influxdb] buffer fullness: 52 / 1000 metrics.                                                                                                                         
2018-08-07T04:23:00Z D! Output [influxdb] wrote batch of 52 metrics in 78.623364ms  

But there is more debug info in #921. How can I enable this debug?

Dees7 commented 6 years ago

Looks like telegraf disconected and doesn't try reconnecting. mosquitto doesn't publish messaget to telegraf

1536337620: Received PINGREQ from telegraf
1536337620: Sending PINGRESP to telegraf
1536337621: Sending PUBLISH to telegraf (d0, q0, r0, m0, 'tele/piduino/stat', ... (99 bytes))
1536337661: Sending PUBLISH to telegraf (d0, q0, r0, m0, 'tele/sdm/STATE', ... (144 bytes))
1536337662: Sending PUBLISH to telegraf (d0, q0, r0, m0, 'tele/sdm/SENSOR', ... (353 bytes))

judging by logs telegraf begins gather less metrics with no error messages

2018-09-07T21:25:00Z D! Output [influxdb] wrote batch of 45 metrics in 55.868805ms
2018-09-07T21:26:20Z D! Output [influxdb] buffer fullness: 77 / 1000 metrics.
2018-09-07T21:26:20Z D! Output [influxdb] wrote batch of 77 metrics in 57.227782ms
2018-09-07T21:27:40Z D! Output [influxdb] buffer fullness: 40 / 1000 metrics.
2018-09-07T21:27:40Z D! Output [influxdb] wrote batch of 40 metrics in 33.749056ms
danielnelson commented 6 years ago

I think this is a duplicate of #4594