helium / console

A management console to onboard and manage devices running on the Helium blockchain network.
Apache License 2.0
104 stars 31 forks source link

MQTT integration randomly disconnects #1244

Open petrkr opened 1 year ago

petrkr commented 1 year ago

Hosted console on console.helium.com with Org PetrkrNET, deviceID is not relevant as it is integration issue

Please describe your issue:

Connection to MQTT server randomly drops. On server side I can see only "client disconnected". Only way how to fix this is change connection URL and change it back, it forces reconnect.

I would suggest some auto-reconnect mechanism in console

In integration DEBUG in console is only "failed to publish" "category": "uplink", "data": { "integration": { "id": "8d02629a-e814-4c87-9af9-360860e95f48", "name": "name mqtt", "status": "error" },

on server logs is

"Sep 14 18:40:04 server mosquitto[1272]: 1663173604: Client 7d1b46d3-f5fc-4c89-88fd-2f1b56f21c1b disconnected.

lausser commented 1 year ago

Same here, i am using flespi.io mqtt broker. Everything worked fine until i missed some events from last night. In the helium console i saw that lora uplinks were received, but they had "integration errors". In the logs of flespi.io i can see every few seconds this pattern: mqtt session connection was accepted session has subscribed mqtt session connection was closed

In the DEBUG of console.helium.com i see Debug Response { "id": "0023300d-4fc1-4a5c-862c-62f349e05464", "name": "flespi", "status": "success" } so it looks like it's basically working. But the constant connect/disconnect is no correct behavior imho. And maybe because of this uplinks forwarded to mqtt integrations can get dropped.

Addendum: i just deleted the integration from the flow and the sessions disappeared from the console at flespi.io. After adding it again, the same happened as before, disconnect/connect/disconnect/connect/.... but...before it happened every ~5 seconds, now only every ~30 seconds.

lausser commented 1 year ago

I installed my own Mosquitto server. From the debug log i can see that Helium creates a client for every device (and subscribes to the tx topic) linked to this integration. So far so good. But every few seconds new connections are established:

1665686797: New connection from 52.8.80.146:40542 on port 8883.
1665686797: New connection from 52.8.80.146:40544 on port 8883.
1665686797: New connection from 52.8.80.146:40546 on port 8883.
1665686797: New connection from 52.8.80.146:40548 on port 8883.
1665686797: Client 801689f8-2c6b-4e75-a75e-c8b4765d699b already connected, closing old connection.
1665686797: New client connected from 52.8.80.146:40542 as 801689f8-2c6b-4e75-a75e-c8b4765d699b (p2, c0, k30, u'helium').
1665686797: No will message specified.
1665686797: Sending CONNACK to 801689f8-2c6b-4e75-a75e-c8b4765d699b (1, 0)
1665686798: Client 9405addf-d5b3-49ab-904a-0428206823f7 already connected, closing old connection.
1665686798: New client connected from 52.8.80.146:40544 as 9405addf-d5b3-49ab-904a-0428206823f7 (p2, c0, k30, u'helium').
1665686798: No will message specified.
1665686798: Sending CONNACK to 9405addf-d5b3-49ab-904a-0428206823f7 (1, 0)
1665686798: Client 09f9801d-2536-4cd5-b62b-4c7f183c5045 already connected, closing old connection.
1665686798: New client connected from 52.8.80.146:40546 as 09f9801d-2536-4cd5-b62b-4c7f183c5045 (p2, c0, k30, u'helium').
1665686798: No will message specified.
1665686798: Sending CONNACK to 09f9801d-2536-4cd5-b62b-4c7f183c5045 (1, 0)
1665686798: Client 49188190-d925-4594-af80-7062436ca47a already connected, closing old connection.
1665686798: New client connected from 52.8.80.146:40548 as 49188190-d925-4594-af80-7062436ca47a (p2, c0, k30, u'helium').
1665686798: No will message specified.

You see that "client ... already connected, closing old connection". Why? I see no reason why a once established connection should be periodically re-established. The only scenario which comes to my mind is: lorawan signal received -> function executed -> integration called -> mqtt established -> payload published -> mqtt disconnected. But this is not the case here, there are mqtt connects without any reason.

I guess i found it: I have a function "heartbeat" which creates an object with just one attribute, indication that this is a PING payload. This function is called in parallel to the real decoder function. So when there was a Ping event and no corresponding event with a data payload then i know that my decoder has crashed. So if there is one device class, i have two edges connecting to the integration. One from the heartbeat, one from the decoder function. And the helium console opens an mqtt subscription for both. It takes the device-id and uses it as the mqtt client-id. So there are two connections with the same client-id fighting each other. Even splitting the integration in a data-mqtt and heartbeat-mqtt does not help, because the mqtt server is still the same. So i am using a dedicated mqtt server for the heartbeats.