techstudio-design / SimpleNB

A simple Arduino library for communication with NB-IoT/Cat-M1 modules
GNU Lesser General Public License v3.0
32 stars 4 forks source link

Support for persistent connections? #10

Closed jftaylorMn closed 8 months ago

jftaylorMn commented 8 months ago

[X ] I have read the Troubleshooting section of the ReadMe

What type of issues is this?

[ ] Request to support a new module

[ ] Bug or problem compiling the library [ ] Bug or issue with library functionality (ie, sending data over TCP/IP) [X ] Question or request for help

What are you working with?

Modem: And-one SIM7080G breakout board Main processor board: ESP32 SimpleNB version: 0.11.4 Code: same as MQTT sample code, except always connecting with the client ID of SIM IMEI, cleanconnection parameter=false

e.g. mqtt.connect(clientId.c_str(), NULL, NULL, 0, 0, 0, 0, cleanConnection) then subscribing to a topic with qos 1:

mqtt.subscribe(topicSubscribe, qos)

Scenario, steps to reproduce

I I am connecting to broker.hivemq.com - a public broker that doesn't require security credentials. Because my transmissions are infrequent, the signal reliability is questionable, and receiving messages from the broker doesn't need to be real-time, I have decided to only connect long enough to publish new messages, subscribe, then let mqtt(loop) process any incoming messages. My experience is that if I can get a connection, it is going to be stable enough to send the small payload. Putting the client.connect() code inside the arduino loop was not viable. FWIW... The client connect, mqtt connect, publish, subscribe, loop() and disconnect are now running in a task on core 1. That improved stability measurably!

Expected result

According to https://www.hivemq.com/blog/mqtt-essentials-part-7-persistent-session-queuing-messages/ , This approach should create a persistent connection which will recognize the client as returning, know the subscriptions, and deliver any missed updates to subscribed topics.

Actual result

A couple issues:

[main.cpp:759] mqttConnect(): [Main] [30905] Connecting to broker.hivemq.com as�? ...

AT+CACLOSE=0   +CME ERROR: operation not allowed AT+CACID=0   OK AT+CASSLCFG=0,SSL,0   OK AT+CAOPEN=0,0,"TCP","broker.hivemq.com",1883   +CAOPEN: 0,0   OK AT+CASEND=0,29 QTT 860016043307508 OK AT+CARECV?   OK AT+CASTATE?   +CASTATE: 0,1   OK   +CASTATE: 0,0 [31705] ### Closed socket: 0

This is not consistent - most of the time, a connection can be made with the same clientId. In that case, I typically see a logged message that says " +CADATAIND: 0

[73359] ### Got Data on socket: 0 } I have a request to hiveMQ to find out if the occasional closed socket is due to my reconnecting too quickly.

Have you ever tried persistent connections using your SIM7080G breakout board? I suspect it's an edge case that could have been missed. According to the docs, when connecting as a recognized persistent client, there should be some sort of ack AT response from the broker?

My primary interest is stability in sending data, so this is not a show stopper. I can see a future interest in being able to receive payloads from the broker to the esp32 code.

Debug and AT command log

I apologize for the random formatting above. It was not intentional

e-tinkers commented 8 months ago

The SimpleNB does not use any MQTT AT-command set provided by any GSM modules, it is strictly only use the TCP/IP AT-command set for providing a TCP connection, any higher level services/protocols (such as HTTP) are build on top the TCP connection by the application, in the case of MQTT, any MQTT services are provided by pubsubclient.h library. If you are using en ESP32, what you could try is instead of using the SIM7080G as the transport layer of communication, try to use WiFi as the transport to see if you still experienced the same behaviour.

Personally, I always using randomised clientID so that server won't think the client is still connected and refuse for reconnection for some reason. Furthermore, MQTT send a heartbeat signal (PINGREQ) so that the server knows the client is still alive, PubSubClient has a default value of 15 seconds, but if you put your SIM7080G in deep sleep and not publish message frequently, the MQTT_KEEPALIVE need to adjust accordingly.

We had devices in the field running MQTT with update frequency from once/hour to once every 12 hours for years without any problem.

jftaylorMn commented 8 months ago

Hi Henry

Thanks for your thorough reply. In general I have come to the same conclusions. This solution is going into a remote area with limited connectivity, 150 miles away from my home. If the code gets into an infinite loop, I have no way to reboot. On a good day, the signal quality there is maybe 17? If a car goes by or the wind blows, the connection can easily be dropped. It might take a few hours, but eventually I would see the led flashing a couple times a second - forever. And…. Yes I’m sure results would be different using WiFi but there is no WiFi in the woods, so that doesn’t get me to the desired endpoint.

In my development environment, signal quality is much better (25+). I could likely keep a single session open for days here (with random/unique client ids). I have reproduced the fast flashing scenario using a fixed client and clean session-false. The library gets into a loop if the newly created socket is closed within 300 ms after connection. I have yet to figure out where that is happening. Without some sort of timeout on the mqtt connect method (like is found on other methods in the lib) , this will block everything until the esp32 and modem are rebooted.

I’ve chased the “keep the connection open” path for months without success. Yeah, from the logged at commands I can see that most of that just sets up a tcp connection then I presume use the websocket to actually transact. Looking into the simcom docs, it appears that there are length constraints using at commands for publishing. So that’s not a good plan for my application.

I really want to get the persistent connection working so that the device only gets subscribed topics once. Using unique client ids might avoid the socket closed issue, but the broker will consistently send the same / most recent message every time mqtt.loop is called when the desired behavior is Qos 1. As defined in the spec.

Failing any sort of timeout parameter for mqtt connect, I am considering placing that code in a freertos task then launch another task that would kill it using the task handle if it’s still alive beyond a timeout. That might work but doesn’t smell right.

Hope this helps clarify why I can’t rely on lengthy sessions, and need persistent connections to work.

John

On Sun, Jan 21, 2024 at 7:44 PM Henry Cheung @.***> wrote:

The SimpleNB does not use any MQTT AT-command set provided by any GSM modules, it is strictly only use the TCP/IP AT-command set for providing a TCP connection, any higher level services/protocols (such as HTTP) are build on top the TCP connection by the application, in the case of MQTT, any MQTT services are provided by pubsubclient.h library. If you are using en ESP32, what you could try is instead of using the SIM7080G as the transport layer of communication, try to use WiFi as the transport to see if you still experienced the same behaviour.

Personally, I always using randomised clientID so that server won't think the client is still connected and refuse for reconnection for some reason. Furthermore, MQTT send a heartbeat signal (PINGREQ) so that the server knows the client is still alive, PubSubClient has a default value of 15 seconds, but if you put your SIM7080G in deep sleep and not publish message frequently, the MQTT_KEEPALIVE need to adjust https://pubsubclient.knolleary.net/api#setKeepAlive accordingly.

We had devices in the field running MQTT with update frequency from once/hour to once every 12 hours for years without any problem.

— Reply to this email directly, view it on GitHub https://github.com/techstudio-design/SimpleNB/issues/10#issuecomment-1902871524, or unsubscribe https://github.com/notifications/unsubscribe-auth/AENX2LWI4QKAVPI2U2HWZWDYPW777AVCNFSM6AAAAABCC2JVAGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBSHA3TCNJSGQ . You are receiving this because you authored the thread.Message ID: @.***>

e-tinkers commented 8 months ago

I'm not suggesting of using WiFi, but recommend to use the WiFi to test your infrastructure to see if the problem still can be observed, if you still experienced the same behaviour, that's means the problem is caused by the MQTT client, and/or the MQTT Broker, not because of the transport layer or TCP stack. I don't have any experience with hiveMQ so we can't comment further on its performance.

techstudio-design commented 8 months ago

I'm going to close this issue until there is clear proof that the issue is caused by the SimpleNB.

BTW, as per what you described in your use-case, I think you might want to test SIM7070G instead of SIM7080G as SIM7070G support Class 3 power class which is allowed under the FCC regulation to be used in US market, it has larger form factor but higher power output, with the same Qualcomm chips as the SIM7080G.

jftaylorMn commented 8 months ago

Thank you Henry, I think it is fine to close this issue. It isn't clear if the problem of the socket being closed is falls under your scope or the HiveMQ broker or some other source. I'm not sure how I can isolate that. I see that basically your code is setting up the connection over GSM, and the pubsub client is doing the bulk of the MQTT work. Since it will be difficult to determine the actual source, I have created a watchdog task that will force the mqtt connect method to fail after a timeout period and then recover gracefully without losing data.

In addition to that workaround, I did some testing on-site over the weekend and see better signal strength (19-22), so maybe cell service is improved? Also, since the frequency of connections is much lower (1 hour vs 1 minute), that could also add to the stability. The code is using the same clientID and so far has not experienced any socket errors. It will take time to know if the problems are in the past. I can see that each MQTT connection (not using security) has an overhead of about 1K, so there would be benefits in keeping the connection open. Perhaps your suggestion of the SIM7070G would make that more feasible.

Thanks for the help, the SimpleNB library, and your advice.