influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.6k stars 5.57k forks source link

VerneMQ + HAProxy + mqtt_consumer = connection lost: EOF #8033

Closed SoulRaven closed 2 years ago

SoulRaven commented 4 years ago

Relevant telegraf.conf:

[[inputs.mqtt_consumer]]
 6   servers = ["tcp://PGMO-PRX-DEB-PRO-02:1883"]
 7   topics = ['tele/sonoff/+/STATE', 'tele/sonoff/+/SENSOR', 'tele/sonoff/+/LWT']
 8   topic_tag = "topic"
 9   qos = 1
10   persistent_session = true
11   max_undelivered_messages = 1000
12   client_id = "telegraf-01"
13   data_format = "json"
14   connection_timeout = "30s"
15   interval = "300s"
16   name_override = "containerSensors"

System info:

Telegraf 1.15.2 (git: HEAD cd037b49) Proxmox LXC container Debian 10

Steps to reproduce:

setup 2 instances of VerneMQ and make a cluster from them setup HAProxy to use VerneMQ cluster

Expected behavior:

To remain connected using persistent connections

Actual behavior:

After prox. 1 minutes, the mqtt_consumer throws: [inputs.mqtt_consumer] Error in plugin: connection lost: EOF

Additional info:

frontend VerneMQ_TCP_frontend
  description VerneMQ TCP non-secure private frontend
  bind 10.20.13.228:1883
  mode tcp
  option clitcpka
  option tcplog
  timeout client 1m
  timeout server 1m
  default_backend VerneMQ_TCP

backend VerneMQ_TCP
  description Back-end for VerneMQ MQTT server
  mode tcp
  balance roundrobin
  option srvtcpka
  server VerneMQ_TCP_1 10.20.13.132:1883 check send-proxy-v2 maxconn 1000 port 1883 fall 1
  server VerneMQ_TCP_2 10.20.13.133:1883 check send-proxy-v2 maxconn 1000 port 1883 fall 1
reimda commented 4 years ago

The telegraph error makes it sound like the other end closed the connection. Could you make a packet capture of this happening so we know for sure which side is closing?

SoulRaven commented 4 years ago

hi, yes, you are right, the connection is closed from the HAProxy. I have fix this, somehow, but i will be more that happy to know that mqtt_consumer has the option to reconnect on disconnect. Now this feature is set to False by default, and i don't know way this is not an option. Also the keep_alive

https://github.com/influxdata/telegraf/blob/8cc08a6363a1883b63c64b1f8fb78ddd2f66f5af/plugins/inputs/mqtt_consumer/mqtt_consumer.go#L363

https://github.com/influxdata/telegraf/blob/8cc08a6363a1883b63c64b1f8fb78ddd2f66f5af/plugins/inputs/mqtt_consumer/mqtt_consumer.go#L364

powersj commented 2 years ago

@soulraven,

Sounds like you changed the auto reconnect to true in your situation to recover from the drop? It looks like we have config options for other connectivity settings, but not this one.

Would you be willing to put up a PR adding this option?

Thanks!

telegraf-tiger[bot] commented 2 years ago

Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Page. Thank you!