u-blox / ubxlib

Portable C libraries which provide APIs to build applications with u-blox products and services. Delivered as add-on to existing microcontroller and RTOS SDKs.
Apache License 2.0
301 stars 89 forks source link

MQTT Ping with UPSV Issue #71

Closed wrh-dev closed 1 year ago

wrh-dev commented 1 year ago

I’m seeing an occasional issue with the SARA-R422M8S when using secure MQTT. When UART power savings mode is enabled, it appears that the MQTT keep-alive ping functionality occasionally fails when trying to publish a message to the broker when a connection lasts long enough for the ping functionality to be exercised. The problem manifests itself like so:

  1. A publish attempt fails with a timeout (+UMQTTER: 13,33)
  2. The library attempts to retry sending the message but the module is completely unresponsive over the AT interface
  3. After some amount of time (maybe a minute or two) of all AT commands not generating a response, the modem issues an undocumented URC that the library ignores: +UUMQTTC: 8,0
  4. After this point, the modem is again responsive over the interface but all MQTT commands are ignored and met with CME ERROR: Operation not allowed, presumably because the publish operation never completed as far as the module is concerned
  5. Even attempts to disconnect from the broker are met with operation not allowed, so after a few failures like this I simply power down the modem and retry the publish operation from scratch which fixes the issue

During this power off procedure, I often see the URCs that I’m “waiting for” issued after entering airplane mode. For example:

AT+CFUN=4 
OK

+UUMQTTC: 9,0

+UUMQTTC: 8,0

+UUMQTTC: 0,100

(Note that the order of the publish and ping URCs is not always consistent.)

This might mean that entering airplane mode and then re-triggering a network and MQTT connection might be a sufficient and quicker fix than the full power cycle that I'm currently doing, but I have not had a chance to test this yet.

This issue seems to happen when (or at least much more often when) UART power savings mode is active.

Have you ever heard of an issue like this? Can you offer any potential fixes or work arounds?

RobMeades commented 1 year ago

Fascinating - not an error case I've seen, let me check with our application engineering people to see what they suggest.

wrh-dev commented 1 year ago

Thanks for the quick response! I look forward to any suggestions the application engineers can provide.

RobMeades commented 1 year ago

Feedback is that we haven't seen exactly this issue but we are tracking something which might be similar, related to +UPSV operation on SARA-R422 and data traffic. Are you able to user our PC-based M-Center tool to take a trace from the module's USB port? It would be very interesting to obtain a full AT dump and a matching module trace for analysis.

RobMeades commented 1 year ago

My apologies - I'd misunderstood what I was told, it would seem that in order to take a trace from the module's USB port you need a version of the M-Center tool which includes some 3rd party IP that we are not permitted to release without an NDA. If you already have an NDA with us, or would like to go down that route, please message me directly (rob.meades@u-blox.com).

Otherwise we'll see if we can reproduce the problem ourselves.

wrh-dev commented 1 year ago

Thanks! I just reached out to you directly.

RobMeades commented 1 year ago

On the +UUMQTTC: 8, 0 URC, it seems that this is only documented for the MQTTSN case in our AT manual (we will fix that) but the behaviour is also there for MQTT, i.e. "when the MQTT PINGRESP from the broker is not received by the module the URC +UUMQTTC: 8,0 is notified to the user".

So it would seem that the MQTT client inside the module is not receiving the MQTT ping response, i.e. the response to the ping which it sends to the broker shortly before the MQTT inactivity timeout expires, to act as a keep-alive.

For our information, can you confirm the FW version of the module (the response to ATI9 command, which the ubxlib code will issue at every power-up)?

wrh-dev commented 1 year ago

Interesting, that seems to make sense.

Would you expect a ping failure to block other MQTT operations (until the broker releases the connection due to keep alive timeout)? In general, would you expect the automatic pings to be handled in such a way where the module can still accept other MQTT commands while performing the pings "under the hood"?

Here's the ATI9 response: 00.12,A00.00.

RobMeades commented 1 year ago

Feedback is that the ping doesn’t block other MQTT operations inside the module MQTT client: if a new AT command arrives at the module while the MQTT client in the module is waiting for the server PINGRESP packet, the MQTT client in the module will queue the new request and then process it when the ping packet is received or when the ping timeout expires.

Hopefully, once you are able to take a trace of the issue, we can get to the bottom of this; thanks for persisting with it.

RobMeades commented 1 year ago

To conclude this thread, we have made a change that is intended to help with MQTT behaviour in fringe coverage conditions in commit 8248f9983717a6997ba009c151ff7acf48550a56. I will close this issue now: @wrh-dev, feel free to re-open it or open another if you think there is more we can do.