u-blox / ubxlib

Portable C libraries which provide APIs to build applications with u-blox products and services. Delivered as add-on to existing microcontroller and RTOS SDKs.
Apache License 2.0
310 stars 94 forks source link

Error return codes on cellular sockets are ignored in socket read calls #76

Closed ghost closed 1 year ago

ghost commented 1 year ago

Hi again,

We've run into another issue with our ubxlib-based project using a SARA R422 that we are trying to solve right now. During testing in development, we sometimes manage to get the modem to close sockets without triggering a URC (one example for this is to manually inject an AT+COPS=2 command disconnecting the modem).

The result of this is that ubxlib never notices that the socket has been closed. The AT-Log shows that every AT+USORD call gets a CME ERROR: 3 response, but ubxlib will always return U_SOCK_EWOULDBLOCK for those sockets.

This seems to be because the pending bytes query in uCellSockRead does not check if errors have occured while parsing the USORD response. Do you see this as a bug in ubxlib that should be fixed or is this a non-issue for you because we are interfering with the AT-command stream?

RobMeades commented 1 year ago

Hi Johannes: certainly it seems that uCellSockRead() should be checking the return code of uAtClientUnlock(): it probably doesn't because I was afraid my head would explode with the testability/permutations, or maybe I wrote uCellSockReceiveFrom() first; there it would be less critical as we're not in a loop.

Do you have a proposed solution that you would be happy with, or would you prefer me to propose something that you could try out?

ghost commented 1 year ago

I think our issue is actually caused before the read-loop is even entered. In our case, we have pendingBytes==0 and thus try to fetch the available amount of data first: https://github.com/u-blox/ubxlib/blob/6a124060b219c5ac8e8deb91a45968542788eeba/cell/src/u_cell_sock.c#L1660-L1692

I'd be happy to try any proposals you have 👍 We are not really using ubxlib like its intended with our AT+COPS=2 hack, so it's understandable for me that this may cause issues. Still, maybe you could consider if there is a way to fix this. Would be nice to now that unavailable sockets are detected even if the URC is not received for some reason.

RobMeades commented 1 year ago

Hokay, I will post something here and update this issue when it's there. MIght be a little while as our test system is in pieces on the bench at the moment and I need to put it back together again :-).

RobMeades commented 1 year ago

Hi @johannesmay: my apologies, I completely forgot about this. I've posted a preview of a proposed fix here; don't worry if you've long-since moved on from this issue: assuming it passes testing I will include it anyway.

ghost commented 1 year ago

Hi, thanks for the response 👍 While this doesn't occur frequently for us, this is definitely still an issue for us, so I'm glad that you implemented a fix. If it's helpful for you, I could validate the fix on our platform next week.

RobMeades commented 1 year ago

Great, if you have the time, that would be useful: no point in us making a fix that fixes nothing.

rideyourstyle commented 1 year ago

I have a similar issue with the ubxlib (used with a SARA-R5 and an adaption-layer for zephyr 3.2).

On the bench, I saw the issue only once. But if I test it on an area with critical cellular coverage, the 7 sockets can run out in an hour. I will try the proposed fix from @RobMeades and test it the next days!

RobMeades commented 1 year ago

@rideyourstyle: that would be great, many thanks! If you find the fix lacking, please feel free to propose/identify a better one.

ghost commented 1 year ago

@RobMeades Finally had some time to check out your fix. I can verify that we now get the expected error return code after injecting the manual AT+COPS=2 and can handle it the application accordingly.

@rideyourstyle Not sure if your issue really is related to this - usually when we get disconnections from the network side, a URC would be triggered and ubxlib would handle that correctly. Also, in our case we weren't running through sockets, instead we would stay connected as it seemed like the existing socket was still connected.

RobMeades commented 1 year ago

@johannesmay: excellent, thanks for feeding back, the fix will be in the release next week (and is in the preview branch posted here).

@rideyourstyle: feedback as to any [other] issues is very welcome, do let us know more about your particular problems.

RobMeades commented 1 year ago

The fix of commit bf091202674ee2e27ab6588e0b7224aa77e6fae2 is now available on the master branch; I will close this issue: @rideyourstyle, please feel free to open another for your problem or re-open this one if you think that is appropriate.