Open eeFLis opened 5 months ago
Hi @eeFLis: can't say this is something we've noticed, @philwareublox is going to check internally.
@eeFLis: some questions. You say that the module allows the CTS
line to float high when DTR
is used for power saving: is the module known to be otherwise functional when this problem has occurred, e.g. is VINT
high, do you maybe get URC's turning up, that kind of thing? Have you happened to be able to capture any HW traces (e.g. of the UART lines with a Saleae probe or similar) that show the relationship between the DTR
line and CTS
or other lines?
Hi
I have attached the salea capture. However, this was already recorded some time ago with the module fw version 03.15, A00.01, but should show the problem SARA-R510S-01B-00 keep CTS high if DTR is used for power save.zip
Can you confirm that you are using the default value for U_CELL_PWR_UART_POWER_SAVING_GSM_FRAMES, so AT+UPSV
looks like this:
AT+UPSV=1,1300
OK
Yes, we use the default value.
At the time of the salea capture, we certainly used a older version of the ubxlib.
If the default value for U_CELL_PWR_UART_POWER_SAVING_GSM_FRAMES was never adjusted in the ubxlib it was AT+UPSV=1,1300
Were you able to gather all the information from the salea capture or do you need additional information?
I've been staring at it with the relevant expert internally.
If you look at the picture below, at the green marker is where the module emits a +CSCON: 0
, so it is no longer RRC-connected and hence could go to sleep at any time (FYI, I'm told that when DTR
-controlled sleep is in play the timeout in the AT+UPSV=1,1300
doesn't apply, the module can go to sleep as soon as DTR
permits it):
12 seconds later comes the problematic part, focusing on that:
You can see, in the middle of the picture, that the module has allowed CTS
to float high, without there being any TXD
UART activity, which must be because the module has decided to go to 32 kHz sleep. A short while later CTS
goes low again, which is likely because some timed activity inside the module has caused it to wake up again, but only briefly. CTS
goes high again, and it is at about this time that the MCU pulls DTR
low in order to send the next AT command.
Pure speculation, of course, but a guess would be that something to do with the fact that the module is autonomously coming out of [EDIT: I meant going into] 32 kHz sleep at the same moment as DTR
goes high [EDIT: I meant low] causes it to miss the DTR
edge and remain in 32 kHz sleep.
Will continue to investigate...
you mean > at the same moment as DTR
goes low right?
That would make sense
Sorry, yes at the same moment as DTR
goes low.
Out of interest, are you able to read the state of the CTS
pin from SW? I had a feeling that STM32 wouldn't allow you to read the state of a pin that had been assigned to the UART? Vaguely wondering if there might be a workaround of some form that you could do if you could read it.
Were you able to gather all the information from the Saleae capture or do you need additional information?
If you have additional Saleae captures of the problem then we could check if they look the same but no matter if you don't.
Unfortunately I have no additional capture. I think the pin status can be read even if it is assigned to the UART. Do you have an idea for a workaround if the pin could be read? then I can test it shortly.
Yes, not exactly sure what the workaround would be really. Haven't been able to think of anything yet that would not introduce a new problem. Will carry on thinking...
One more question: do you know if the problem goes away if you set UPSV=0
? I mean I assume it does, but I thought I'd ask.
It already goes away when we set the U_CFG_APP_PIN_CELL_DTR -1
.
However, this makes the application much less energy efficient.
Thanks for confirming, understood, the aim is of course to retain the power-saving goodness, just wanted to be sure of the problem.
Just for my information, if you set U_CFG_APP_PIN_CELL_DTR -1
, so you aren't using the DTR
pin to control UART power saving, you can still have UART power saving of course, provided you have implemented uPortUartCtsSuspend(), which should be there for STM32F4. It may be slightly more clumsy, since the ubxlib
code will be waking the module up by prodding it with AT
if the module has not been talked-to for more than 6 seconds, but it should still save pretty much the same amount of power, unless your device is going in and out of UART power saving very frequently (e.g. every few 10's of seconds, as opposed to every few minutes).
Is this an approach you have tried?
Yes we are using this approach.
That's what I tried to say with "we set U_CFG_APP_PIN_CELL_DTR -1
instead of switching off power saving control completely with UPSV=0
". Was not quite clear sorry.
The additional consumption is mainly noticeable after RRC connection ist released and the timer T3324 is running. Any communication over UART during this time wakes up the module for 6 seconds, which is completely unnecessary.
I understand that the extra consumption doesn't look like much but if you're focusing on low power it's a crucial. The low power features of the SARA R510s were one reason why we chose it.
For my own clarity, are you saying without using the DTR line to drop the consumption of the UART you're then seeing the basic 6 second timeout for the UART. Whereas if you were able to use the DTR line successfully every time without this CTS issue, you can immediately control the UART on/off state as you want to.
yes.
As an example: during the active time (T3324) when data can still be received, the ubxlib is polling for new incomning data. this means that the 6 second timeouts for the UART are never reached and the module do not entering power saving.
If the DTR line is used for power saving control the module can switch to power save mode immediately after the response.
@philwareublox and I have been discussing this, maybe there is a way forward. Background first:
ubxlib
is designed to present a sockets interface,ubxlib
doesn't have one,uSockRead()
or uSockReceiveFrom()
is called, ubxlib
always just asks the module for data, it doesn't simply trust a data count from the URC as the URC is not emitted again until there has been a read, and whether a read has been done is an application matter, out of ubxlib
's control, there would be faaaar too much opportunity for misunderstandings,uSockRead()
or uSockReceiveFrom()
, and that shags power saving in your scenario.A fix, then, might be to offer a function something like uSockDataAvailableCallback()
: if you set such a callback, we will call it when the URC lands. You can hook that in and not bother calling uSockRead()
or uSockReceiveFrom()
until your callback has been called.
Could that work?
I think ubxlib already offers a solution for this. We use uCellSockGetBytesPending()
to get the pending bytes reported by URC.
Only when some data has been received, we call uSockRead()
or uSockReceiveFrom()
However, this only solves the problem for the example I mentioned. Any communication with the module during the active time (T3324) prevents the module from switching to power save mode for an unnecessarily long time.
or do you see a problem with the use of uCellSockGetBytesPending()
for the case described in the example?
It should work, I guess the only problem might be if there is a lag between when the URC arrived to updated the number and when the application called uCellSockGetBytesPending()
; that might cause the uSockRead()
to fall into sleepy-time. But even in the best case, where you read the data immediately, you will still get the 6 second lag. And reducing the value of U_CELL_PWR_UART_POWER_SAVING_GSM_FRAMES
will result in more laggy behaviour from the ubxlib
APIs in general, since ubxlib
will have to send an AT
to wake the module up on more occasions.
We will continue thinking.
OK thanks. In the meantime, are there any further findings as to where the error comes from when DTR is active?
The expert internally is trying to reproduce the problem, just now looking to find the same FW version as you are using, to be quite sure about it. It is a strange case: the CTS
line was allowed to float high, indicating that the module has entered 32 kHz sleep, fully 3 milliseconds before the DTR
line is asserted: that's quite a lot of time in module terms, hard to see what the relationship might be.
FYI, the expert internally has your FW version now and is trying to script some form of sliding-delay-window of "AT
/OK
followed by a delay and then DTR
raised" to see if he can make the problem occur; nothing from the few stabs he's had at this yet.
Wondering idly: I guess you have a callback hooked into uCellNetSetBaseStationConnectionStatusCallback()? I understand that the +CSCON:x
URC, which that callback is triggered by, marks the start of T3324 - is there something (admittedly probably hacky and complicated) you could do with that and some "usually correct" guess at T3324 to hold off or bounce modem-access as appropriate?
thanks for the update
yes we have a callback hooked into uCellNetSetBaseStationConnectionStatusCallback()
which marks the start of T3324.
If I understand you correctly, you are thinking about a solution if the DTR problem cannot be solved and we have to fall back on the UART timeout right? That could be a possible option, but then we would still have the 6 second lag.
That could be a possible option, but then we would still have the 6 second lag.
True: if it is the case [as I think it is] that SARA-R5 cannot run the UART from a 32 kHz clock then every AT communication pulls the modem out of 32 kHz sleep and the only way to get back in is by letting DTR
rise or by waiting for N (by default 6) seconds.
I was think that, if you could know that the module was not going to enter 32 kHz sleep anyway (which might be after +CSCON: 1
) then, for those periods, you could just do anything on the AT interface, while for the times when the module could enter 32 kHz sleep (which might be after +CSCON: 0
) you could, somehow, hold-off/batch-up/manage calls into ubxlib
that might generate AT commands in order to maximize the power saved.
All a bit complex, and vague, I know.
I have done a few more tests.
I think the first assumption that a DTR
edge will be missed cannot be confirmed.
The module is no longer responsive in this state even after the DTR
Line is pulled high
and low
again.
But I still have to verify it with a scope.
Very interesting: would it be possible, once you get the module in this situation, to (a) measure VINT and maybe (b) [and I don't know how you'd do this] call uPortUartCtsSuspend(), so that the UART will send anyway, and see if the module is responsive to AT
commands at all (despite the state of CTS).
I'm not sure what kinds of module-based debug are available when the module is, apparently, in 32kHz sleep, but I'll ask.
I think that somehow putting a pattern to the conditions that cause the problem would be good - maybe attach something to measure power to see how sleepy it really is?
The relevant expert internally has been trying quite hard but hasn't yet been able to reproduce a "module unresponsive" situation. He's been focusing on the end of your Saleae trace, around here:
...where the DTR and CTS edges seem close:
...and the kinds of thing he has been reproducing can be seen here in this Otii power meter trace:
..where it is looking pretty similar, but he has been unable to make the module unresponsive.
He suggests that, if it is possible, having some form of current-draw trace across the failure might tell him the state of the module at that time.
It is not easy to reproduce the error as it sometimes only occurs after hours and seemingly random. I will attach the Otti and try to catch the error.
Come back when I have the requested data.
Hi Here are the recorded data. Module firmware 03.30, A00.01 was used for the recording. SARA-R510S-01B-01 keep CTS high if DTR is used for power save 06-02-2024.zip
Please note that the power profile includes the power consumption of all hardware components.
The recording shows the following sequence:
Oh, that's excellent, thanks very much for taking the time to do this, I will pass it on to the right person and get back to you.
Browsing the Saleae log (I'll leave it up to the expert to view the OTII log as Windows Defender seems to have decided the OTII 3 application is a trojan on my laptop and keeps closing it), just to confirm the area of interest, it is here:
...zooming in a bit:
...and zooming in again, the last thing that happens being an AT+UCGED
query which seems to go fine, then:
...zooming in again, DTR
is pulled low by the MCU to do the next thing about 100 ms later (happens to be about 1 ms after CTS
has gone high) and it has no effect; CTS
remains high:
...and stays there for 100 seconds, until you pull the plug. During that time there are no URCs, VINT
is high, but nothing at all of any use seems to be happening modem-wise:
Is that a correct interpretation? Anything I've missed?
For the benefit of our internal expert, I've exported from the Saleae log the AT exchange in the portion of interest, i.e. after coming out of PSM the second time, and transcribed it as best I can below; all pretty standard ubxlib
stuff, with 3GPP power saving active and UPSV mode 3 enabled.
One interesting thing is that the module seems to be still connected when the problem occurs (we have not seen a +CSCON: 0
after the +CSCON: 1
that was emitted by the module 6.94 seconds before the problem point) so it might not be very sleepy.
AT
OK
ATE0
OK
AT+CMEE=2
OK
AT+UDCONF=1,0
OK
ATI9
03.30,A00.01
OK
AT&C1
OK
AT&D0
OK
AT&K3
OK
AT+UPSV=3
OK
AT+UPSMR=1
OK
AT+CPSMS?
+CPSMS: 1,,,"01000011","00001000"
OK
AT+UMNOPROF?
+UMNOPROF: 100
OK
AT+UGPRF?
+UGPRF: 2,0,""
OK
AT+UPSD=0,0,0
OK
AT+UPSD=0,100,1
OK
AT+UPSDA=0,3
OK
+UUPSDA: 0,"10.64.62.108"
AT+USOCR=17,14127
+USOCR: 0
OK
AT+USOCO=0,"<IP address>",<port>
OK
AT+USOST=0,"<IP address>",<port>,51
@ <51 bytes of binary data follows>
OK
+USOST: 0,51
AT+UGCNTRD
+UGCNTRD: 1,79,0,79,0
OK
AT+UCPSMS?
+UCPSMS: 1,,,"00111111","00001000",1
OK
+CSCON: 1
+UUSORD: 0,17
AT+USORF=0,1024
+USORF: 0,"<IP address>",<port>,17,"<17 bytes of binary data in here>"
OK
AT+UGCNTRD
+UGCNTRD: 1,79,45,79,45
OK
AT+CSQ
+CSQ: 11,3
OK
AT+UCGED?
+UCGED: 2
6,4,228,01
6300,20,50,50,849,1098a06,499,e8953ab5,8000,c0,19,15,2.18,3,255,255,19,-50,-50,300,255,255,80
OK
OK, our internal expert has been staring at the Saleae and Otii logs and is wondering if the issue is that, around the point of failure, ubxlib
releases DTR
(after the AT+UCGED
query) and then the next ubxlib
API call is made and so ubxlib
pulls DTR
low again, which in this case happens to be around 20 ms later; this time is relatively short. Putting the HW into sleep and taking it out again apparently takes somewhere around 25ish ms so it might be that short releases of DTR
are the problem.
I will add code to ubxlib
which remembers the time at which DTR
was last released and, when a subsequent ubxlib
API call is made, it will ensure that a minimum period [to be safe we will make that 50 ms] has passed before DTR
is pulled low again and the ubxlib
API call is executed. If I push a preview branch of that code here, would you be able to try it?
Yes for sure
Actually, better than that, the code is already there (I'd obviously anticipated something of this nature and then forgotten): U_CELL_PRIVATE_DTR_PIN_HYSTERESIS_INTERVAL_MS is set to U_AT_CLIENT_ACTIVITY_PIN_HYSTERESIS_INTERVAL_MS, which is 10 ms.
Could you try overriding the conditional compilation flag U_CELL_PRIVATE_DTR_PIN_HYSTERESIS_INTERVAL_MS
at build time to, say, 50 ms, see if that makes any difference? Since the problem is statistical in nature I guess it might be difficult to be sure of course.
I will do that. But even if the time is too short to wake up again, the module is in a state where consumption is much higher than it should be.
Yes: the thinking is that there is a bug somewhere in the module FW which means that, if DTR
is raised and then lowered again too quickly, it goes off the rails, and in such a case it would never sleep.
EDIT: this is backed-up by the fact that the module does not awaken again after about 60 seconds, you having left it for at least 100 seconds; normally, processes within the module would cause it to awaken by itself, if this is not happening then it is relatively seriously upset.
Checking the code again, can I ask that, rather than defining U_CELL_PRIVATE_DTR_PIN_HYSTERESIS_INTERVAL_MS
to be 50 ms, you define U_AT_CLIENT_ACTIVITY_PIN_HYSTERESIS_INTERVAL_MS
to be 50 ms? Reason is that, ultimately, it is the code within the AT client which controls the pin behaviour, rather than the code in the cellular area.
EDIT: IGNORE WHAT I SAID ABOVE and set U_CELL_PRIVATE_DTR_PIN_HYSTERESIS_INTERVAL_MS
to be 50 ms!!!
Apologies for the confusion, please stick with setting U_CELL_PRIVATE_DTR_PIN_HYSTERESIS_INTERVAL_MS
to be 50 ms, I should learn how to read my own code :-).
Unfortunately, the error still occurs
Here is the recording capture U_CELL_PRIVATE_DTR_PIN_HYSTERESIS_INTERVAL_MS 50ms.zip
This time the error occurs without the level of DTR altering. However, something seems to wake the module from sleep which causes the error.
Oh! That is interesting: seems like an exceptionally short wake-up (low CTS
for just 5 ms) performed autonomously by the module, DTR
being high for the entire time. I will alert our internal expert to the new information.
Looking at the Saleae trace, the DTR
toggles don't seem to have increased to 50 ms, still seem to around 20 ms, I wonder if there's something up with the hysteresis code? Will check it...
should not have been set U_CELL_PWR_UART_POWER_SAVING_DTR_HYSTERESIS_MS
to 50ms instead of U_CELL_PRIVATE_DTR_PIN_HYSTERESIS_INTERVAL_MS
?
Gah, yes, you're right: there are two, one in the cellular code which is applied when things like CMUX need to disable DTR
, and the other one in the AT Client for when it is the one that has to toggle DTR
(the usual case). It is probably not a good idea that these are uncoordinated as well.
I will look at how to do this better but, as you say, set U_CELL_PWR_UART_POWER_SAVING_DTR_HYSTERESIS_MS
, which is currently 20 ms, to be 50 ms for the purposes of this experiment. My apologies.
should i reset U_CELL_PRIVATE_DTR_PIN_HYSTERESIS_INTERVAL_MS to the previous value?
Hi
maybe no ubxlib related problem, but perhaps you have already observed this behavior
If the DTR pin is used for power saving, the SARA R510 s module sporadically holds the CTS line high and no longer releases it. the module is then no longer responsive.
we had the problem several times with the module firmware version 03.15, A00.01. We hoped that this problem would be solved with the current module firmware version 03.30, A00.01, but the problem still occurs, even if it is much less frequent (after hours ).
Have you observed such behavior in your test environment?