u-blox / ubxlib

Portable C libraries which provide APIs to build applications with u-blox products and services. Delivered as add-on to existing microcontroller and RTOS SDKs.
Apache License 2.0
287 stars 82 forks source link

SARA-R510S keeps CTS line sporadically high if DTR is used for power saving #192

Open eeFLis opened 5 months ago

eeFLis commented 5 months ago

Hi

maybe no ubxlib related problem, but perhaps you have already observed this behavior

If the DTR pin is used for power saving, the SARA R510 s module sporadically holds the CTS line high and no longer releases it. the module is then no longer responsive.

we had the problem several times with the module firmware version 03.15, A00.01. We hoped that this problem would be solved with the current module firmware version 03.30, A00.01, but the problem still occurs, even if it is much less frequent (after hours ).

Have you observed such behavior in your test environment?

RobMeades commented 5 months ago

Hi @eeFLis: can't say this is something we've noticed, @philwareublox is going to check internally.

RobMeades commented 5 months ago

@eeFLis: some questions. You say that the module allows the CTS line to float high when DTR is used for power saving: is the module known to be otherwise functional when this problem has occurred, e.g. is VINT high, do you maybe get URC's turning up, that kind of thing? Have you happened to be able to capture any HW traces (e.g. of the UART lines with a Saleae probe or similar) that show the relationship between the DTR line and CTS or other lines?

eeFLis commented 5 months ago

Hi

I have attached the salea capture. However, this was already recorded some time ago with the module fw version 03.15, A00.01, but should show the problem SARA-R510S-01B-00 keep CTS high if DTR is used for power save.zip

RobMeades commented 5 months ago

Can you confirm that you are using the default value for U_CELL_PWR_UART_POWER_SAVING_GSM_FRAMES, so AT+UPSV looks like this:

AT+UPSV=1,1300
OK
eeFLis commented 5 months ago

Yes, we use the default value. At the time of the salea capture, we certainly used a older version of the ubxlib. If the default value for U_CELL_PWR_UART_POWER_SAVING_GSM_FRAMES was never adjusted in the ubxlib it was AT+UPSV=1,1300

Were you able to gather all the information from the salea capture or do you need additional information?

RobMeades commented 5 months ago

I've been staring at it with the relevant expert internally.

If you look at the picture below, at the green marker is where the module emits a +CSCON: 0, so it is no longer RRC-connected and hence could go to sleep at any time (FYI, I'm told that when DTR-controlled sleep is in play the timeout in the AT+UPSV=1,1300 doesn't apply, the module can go to sleep as soon as DTR permits it):

image

12 seconds later comes the problematic part, focusing on that:

image

You can see, in the middle of the picture, that the module has allowed CTS to float high, without there being any TXD UART activity, which must be because the module has decided to go to 32 kHz sleep. A short while later CTS goes low again, which is likely because some timed activity inside the module has caused it to wake up again, but only briefly. CTS goes high again, and it is at about this time that the MCU pulls DTR low in order to send the next AT command.

Pure speculation, of course, but a guess would be that something to do with the fact that the module is autonomously coming out of [EDIT: I meant going into] 32 kHz sleep at the same moment as DTR goes high [EDIT: I meant low] causes it to miss the DTR edge and remain in 32 kHz sleep.

Will continue to investigate...

eeFLis commented 5 months ago

you mean > at the same moment as DTRgoes low right? That would make sense

RobMeades commented 5 months ago

Sorry, yes at the same moment as DTR goes low.

RobMeades commented 5 months ago

Out of interest, are you able to read the state of the CTS pin from SW? I had a feeling that STM32 wouldn't allow you to read the state of a pin that had been assigned to the UART? Vaguely wondering if there might be a workaround of some form that you could do if you could read it.

RobMeades commented 5 months ago

Were you able to gather all the information from the Saleae capture or do you need additional information?

If you have additional Saleae captures of the problem then we could check if they look the same but no matter if you don't.

eeFLis commented 5 months ago

Unfortunately I have no additional capture. I think the pin status can be read even if it is assigned to the UART. Do you have an idea for a workaround if the pin could be read? then I can test it shortly.

RobMeades commented 5 months ago

Yes, not exactly sure what the workaround would be really. Haven't been able to think of anything yet that would not introduce a new problem. Will carry on thinking...

RobMeades commented 5 months ago

One more question: do you know if the problem goes away if you set UPSV=0? I mean I assume it does, but I thought I'd ask.

eeFLis commented 5 months ago

It already goes away when we set the U_CFG_APP_PIN_CELL_DTR -1. However, this makes the application much less energy efficient.

RobMeades commented 5 months ago

Thanks for confirming, understood, the aim is of course to retain the power-saving goodness, just wanted to be sure of the problem.

RobMeades commented 5 months ago

Just for my information, if you set U_CFG_APP_PIN_CELL_DTR -1, so you aren't using the DTR pin to control UART power saving, you can still have UART power saving of course, provided you have implemented uPortUartCtsSuspend(), which should be there for STM32F4. It may be slightly more clumsy, since the ubxlib code will be waking the module up by prodding it with AT if the module has not been talked-to for more than 6 seconds, but it should still save pretty much the same amount of power, unless your device is going in and out of UART power saving very frequently (e.g. every few 10's of seconds, as opposed to every few minutes).

Is this an approach you have tried?

eeFLis commented 5 months ago

Yes we are using this approach.

That's what I tried to say with "we set U_CFG_APP_PIN_CELL_DTR -1 instead of switching off power saving control completely with UPSV=0". Was not quite clear sorry.

The additional consumption is mainly noticeable after RRC connection ist released and the timer T3324 is running. Any communication over UART during this time wakes up the module for 6 seconds, which is completely unnecessary.

I understand that the extra consumption doesn't look like much but if you're focusing on low power it's a crucial. The low power features of the SARA R510s were one reason why we chose it.

philwareublox commented 5 months ago

For my own clarity, are you saying without using the DTR line to drop the consumption of the UART you're then seeing the basic 6 second timeout for the UART. Whereas if you were able to use the DTR line successfully every time without this CTS issue, you can immediately control the UART on/off state as you want to.

eeFLis commented 5 months ago

yes.

As an example: during the active time (T3324) when data can still be received, the ubxlib is polling for new incomning data. this means that the 6 second timeouts for the UART are never reached and the module do not entering power saving.

If the DTR line is used for power saving control the module can switch to power save mode immediately after the response.

RobMeades commented 5 months ago

@philwareublox and I have been discussing this, maybe there is a way forward. Background first:

A fix, then, might be to offer a function something like uSockDataAvailableCallback(): if you set such a callback, we will call it when the URC lands. You can hook that in and not bother calling uSockRead() or uSockReceiveFrom() until your callback has been called.

Could that work?

eeFLis commented 5 months ago

I think ubxlib already offers a solution for this. We use uCellSockGetBytesPending() to get the pending bytes reported by URC. Only when some data has been received, we call uSockRead() or uSockReceiveFrom()

eeFLis commented 5 months ago

However, this only solves the problem for the example I mentioned. Any communication with the module during the active time (T3324) prevents the module from switching to power save mode for an unnecessarily long time.

eeFLis commented 5 months ago

or do you see a problem with the use of uCellSockGetBytesPending() for the case described in the example?

RobMeades commented 5 months ago

It should work, I guess the only problem might be if there is a lag between when the URC arrived to updated the number and when the application called uCellSockGetBytesPending(); that might cause the uSockRead() to fall into sleepy-time. But even in the best case, where you read the data immediately, you will still get the 6 second lag. And reducing the value of U_CELL_PWR_UART_POWER_SAVING_GSM_FRAMES will result in more laggy behaviour from the ubxlib APIs in general, since ubxlib will have to send an AT to wake the module up on more occasions.

We will continue thinking.

eeFLis commented 5 months ago

OK thanks. In the meantime, are there any further findings as to where the error comes from when DTR is active?

RobMeades commented 5 months ago

The expert internally is trying to reproduce the problem, just now looking to find the same FW version as you are using, to be quite sure about it. It is a strange case: the CTS line was allowed to float high, indicating that the module has entered 32 kHz sleep, fully 3 milliseconds before the DTR line is asserted: that's quite a lot of time in module terms, hard to see what the relationship might be.

RobMeades commented 5 months ago

FYI, the expert internally has your FW version now and is trying to script some form of sliding-delay-window of "AT/OK followed by a delay and then DTR raised" to see if he can make the problem occur; nothing from the few stabs he's had at this yet.

Wondering idly: I guess you have a callback hooked into uCellNetSetBaseStationConnectionStatusCallback()? I understand that the +CSCON:x URC, which that callback is triggered by, marks the start of T3324 - is there something (admittedly probably hacky and complicated) you could do with that and some "usually correct" guess at T3324 to hold off or bounce modem-access as appropriate?

eeFLis commented 5 months ago

thanks for the update yes we have a callback hooked into uCellNetSetBaseStationConnectionStatusCallback() which marks the start of T3324. If I understand you correctly, you are thinking about a solution if the DTR problem cannot be solved and we have to fall back on the UART timeout right? That could be a possible option, but then we would still have the 6 second lag.

RobMeades commented 5 months ago

That could be a possible option, but then we would still have the 6 second lag.

True: if it is the case [as I think it is] that SARA-R5 cannot run the UART from a 32 kHz clock then every AT communication pulls the modem out of 32 kHz sleep and the only way to get back in is by letting DTR rise or by waiting for N (by default 6) seconds.

I was think that, if you could know that the module was not going to enter 32 kHz sleep anyway (which might be after +CSCON: 1) then, for those periods, you could just do anything on the AT interface, while for the times when the module could enter 32 kHz sleep (which might be after +CSCON: 0) you could, somehow, hold-off/batch-up/manage calls into ubxlib that might generate AT commands in order to maximize the power saved.

All a bit complex, and vague, I know.

eeFLis commented 5 months ago

I have done a few more tests. I think the first assumption that a DTR edge will be missed cannot be confirmed. The module is no longer responsive in this state even after the DTR Line is pulled high and low again. But I still have to verify it with a scope.

RobMeades commented 5 months ago

Very interesting: would it be possible, once you get the module in this situation, to (a) measure VINT and maybe (b) [and I don't know how you'd do this] call uPortUartCtsSuspend(), so that the UART will send anyway, and see if the module is responsive to AT commands at all (despite the state of CTS).

I'm not sure what kinds of module-based debug are available when the module is, apparently, in 32kHz sleep, but I'll ask.

I think that somehow putting a pattern to the conditions that cause the problem would be good - maybe attach something to measure power to see how sleepy it really is?

RobMeades commented 5 months ago

The relevant expert internally has been trying quite hard but hasn't yet been able to reproduce a "module unresponsive" situation. He's been focusing on the end of your Saleae trace, around here:

image

...where the DTR and CTS edges seem close:

image

...and the kinds of thing he has been reproducing can be seen here in this Otii power meter trace:

image

..where it is looking pretty similar, but he has been unable to make the module unresponsive.

He suggests that, if it is possible, having some form of current-draw trace across the failure might tell him the state of the module at that time.

eeFLis commented 5 months ago

It is not easy to reproduce the error as it sometimes only occurs after hours and seemingly random. I will attach the Otti and try to catch the error.

Come back when I have the requested data.

eeFLis commented 5 months ago

Hi Here are the recorded data. Module firmware 03.30, A00.01 was used for the recording. SARA-R510S-01B-01 keep CTS high if DTR is used for power save 06-02-2024.zip

Please note that the power profile includes the power consumption of all hardware components.

The recording shows the following sequence:

  1. module is in PSM
  2. wakes up and transmits some data through a UDP socket
  3. return to PSM
  4. wakes up and transmits some data through a UDP socket (the error occurs here)
RobMeades commented 5 months ago

Oh, that's excellent, thanks very much for taking the time to do this, I will pass it on to the right person and get back to you.

RobMeades commented 5 months ago

Browsing the Saleae log (I'll leave it up to the expert to view the OTII log as Windows Defender seems to have decided the OTII 3 application is a trojan on my laptop and keeps closing it), just to confirm the area of interest, it is here:

image

...zooming in a bit:

image

...and zooming in again, the last thing that happens being an AT+UCGED query which seems to go fine, then:

image

...zooming in again, DTR is pulled low by the MCU to do the next thing about 100 ms later (happens to be about 1 ms after CTS has gone high) and it has no effect; CTS remains high:

image

...and stays there for 100 seconds, until you pull the plug. During that time there are no URCs, VINT is high, but nothing at all of any use seems to be happening modem-wise:

image

Is that a correct interpretation? Anything I've missed?

RobMeades commented 5 months ago

For the benefit of our internal expert, I've exported from the Saleae log the AT exchange in the portion of interest, i.e. after coming out of PSM the second time, and transcribed it as best I can below; all pretty standard ubxlib stuff, with 3GPP power saving active and UPSV mode 3 enabled.

One interesting thing is that the module seems to be still connected when the problem occurs (we have not seen a +CSCON: 0 after the +CSCON: 1 that was emitted by the module 6.94 seconds before the problem point) so it might not be very sleepy.

AT
OK
ATE0
OK
AT+CMEE=2
OK
AT+UDCONF=1,0
OK
ATI9
03.30,A00.01
OK
AT&C1
OK
AT&D0
OK
AT&K3
OK
AT+UPSV=3
OK
AT+UPSMR=1
OK
AT+CPSMS?
+CPSMS: 1,,,"01000011","00001000"
OK
AT+UMNOPROF?
+UMNOPROF: 100
OK
AT+UGPRF?
+UGPRF: 2,0,""
OK
AT+UPSD=0,0,0
OK
AT+UPSD=0,100,1
OK
AT+UPSDA=0,3
OK

+UUPSDA: 0,"10.64.62.108"

AT+USOCR=17,14127
+USOCR: 0
OK
AT+USOCO=0,"<IP address>",<port>
OK
AT+USOST=0,"<IP address>",<port>,51
@ <51 bytes of binary data follows>
OK

+USOST: 0,51

AT+UGCNTRD
+UGCNTRD: 1,79,0,79,0
OK
AT+UCPSMS?
+UCPSMS: 1,,,"00111111","00001000",1
OK

+CSCON: 1

+UUSORD: 0,17

AT+USORF=0,1024
+USORF: 0,"<IP address>",<port>,17,"<17 bytes of binary data in here>"
OK
AT+UGCNTRD
+UGCNTRD: 1,79,45,79,45
OK
AT+CSQ
+CSQ: 11,3
OK
AT+UCGED?
+UCGED: 2
6,4,228,01
6300,20,50,50,849,1098a06,499,e8953ab5,8000,c0,19,15,2.18,3,255,255,19,-50,-50,300,255,255,80
OK
RobMeades commented 5 months ago

OK, our internal expert has been staring at the Saleae and Otii logs and is wondering if the issue is that, around the point of failure, ubxlib releases DTR (after the AT+UCGED query) and then the next ubxlib API call is made and so ubxlib pulls DTR low again, which in this case happens to be around 20 ms later; this time is relatively short. Putting the HW into sleep and taking it out again apparently takes somewhere around 25ish ms so it might be that short releases of DTR are the problem.

I will add code to ubxlib which remembers the time at which DTR was last released and, when a subsequent ubxlib API call is made, it will ensure that a minimum period [to be safe we will make that 50 ms] has passed before DTR is pulled low again and the ubxlib API call is executed. If I push a preview branch of that code here, would you be able to try it?

eeFLis commented 5 months ago

Yes for sure

RobMeades commented 5 months ago

Actually, better than that, the code is already there (I'd obviously anticipated something of this nature and then forgotten): U_CELL_PRIVATE_DTR_PIN_HYSTERESIS_INTERVAL_MS is set to U_AT_CLIENT_ACTIVITY_PIN_HYSTERESIS_INTERVAL_MS, which is 10 ms.

Could you try overriding the conditional compilation flag U_CELL_PRIVATE_DTR_PIN_HYSTERESIS_INTERVAL_MS at build time to, say, 50 ms, see if that makes any difference? Since the problem is statistical in nature I guess it might be difficult to be sure of course.

eeFLis commented 5 months ago

I will do that. But even if the time is too short to wake up again, the module is in a state where consumption is much higher than it should be. power profil DTR

RobMeades commented 5 months ago

Yes: the thinking is that there is a bug somewhere in the module FW which means that, if DTR is raised and then lowered again too quickly, it goes off the rails, and in such a case it would never sleep.

EDIT: this is backed-up by the fact that the module does not awaken again after about 60 seconds, you having left it for at least 100 seconds; normally, processes within the module would cause it to awaken by itself, if this is not happening then it is relatively seriously upset.

Checking the code again, can I ask that, rather than defining U_CELL_PRIVATE_DTR_PIN_HYSTERESIS_INTERVAL_MS to be 50 ms, you define U_AT_CLIENT_ACTIVITY_PIN_HYSTERESIS_INTERVAL_MS to be 50 ms? Reason is that, ultimately, it is the code within the AT client which controls the pin behaviour, rather than the code in the cellular area.

EDIT: IGNORE WHAT I SAID ABOVE and set U_CELL_PRIVATE_DTR_PIN_HYSTERESIS_INTERVAL_MS to be 50 ms!!!

RobMeades commented 5 months ago

Apologies for the confusion, please stick with setting U_CELL_PRIVATE_DTR_PIN_HYSTERESIS_INTERVAL_MS to be 50 ms, I should learn how to read my own code :-).

eeFLis commented 5 months ago

Unfortunately, the error still occurs

Here is the recording capture U_CELL_PRIVATE_DTR_PIN_HYSTERESIS_INTERVAL_MS 50ms.zip

eeFLis commented 5 months ago

This time the error occurs without the level of DTR altering. However, something seems to wake the module from sleep which causes the error.

power profil CTS high

RobMeades commented 5 months ago

Oh! That is interesting: seems like an exceptionally short wake-up (low CTS for just 5 ms) performed autonomously by the module, DTR being high for the entire time. I will alert our internal expert to the new information.

RobMeades commented 5 months ago

Looking at the Saleae trace, the DTR toggles don't seem to have increased to 50 ms, still seem to around 20 ms, I wonder if there's something up with the hysteresis code? Will check it...

image

eeFLis commented 5 months ago

should not have been set U_CELL_PWR_UART_POWER_SAVING_DTR_HYSTERESIS_MS to 50ms instead of U_CELL_PRIVATE_DTR_PIN_HYSTERESIS_INTERVAL_MS?

RobMeades commented 5 months ago

Gah, yes, you're right: there are two, one in the cellular code which is applied when things like CMUX need to disable DTR, and the other one in the AT Client for when it is the one that has to toggle DTR (the usual case). It is probably not a good idea that these are uncoordinated as well.

I will look at how to do this better but, as you say, set U_CELL_PWR_UART_POWER_SAVING_DTR_HYSTERESIS_MS, which is currently 20 ms, to be 50 ms for the purposes of this experiment. My apologies.

eeFLis commented 5 months ago

should i reset U_CELL_PRIVATE_DTR_PIN_HYSTERESIS_INTERVAL_MS to the previous value?