Closed slavendam closed 1 day ago
@slavendam I had seen this issue in my mailbox and I was intrigued, but had not found any time to look at it yet. Now you've pointed out that https://github.com/Lora-net/LoRaMac-node/issues/1349 might be relevant, and it might very well be. I have commented at that issue, with some debugging suggestions as well. Maybe best to report the results here, to not pollute that issue to much?
Thanks for reaching back!
I've already printed around line
TimerSetValue( &MacCtx.RxWindowTimer1, MacCtx.RxWindow1Delay - offset );
Full code is this:
// Setup timers
printf("ProcessRadioTxDone 0 - START\n"); // CHECKPOINT 1
CRITICAL_SECTION_BEGIN( );
uint32_t offset = TimerGetCurrentTime( ) - TxDoneParams.CurTime;
TimerSetValue( &MacCtx.RxWindowTimer1, MacCtx.RxWindow1Delay - offset );
TimerStart( &MacCtx.RxWindowTimer1 );
printf("TimerStart RxWindowTimer1 - DONE\n"); // CHECKPOINT 2
TimerSetValue( &MacCtx.RxWindowTimer2, MacCtx.RxWindow2Delay - offset );
TimerStart( &MacCtx.RxWindowTimer2 );
printf("TimerStart RxWindowTimer2 - DONE\n"); // CHECKPOINT 3
CRITICAL_SECTION_END( );
printf("ProcessRadioTxDone 1: TimerGetCurrentTime=%d, CurTime=%d, offset=%d \n",TimerGetCurrentTime(), TxDoneParams.CurTime, offset);
printf("ProcessRadioTxDone 2: RxWindow1Delay=%d, delay-off=%d\n", MacCtx.RxWindow1Delay, MacCtx.RxWindow1Delay - offset);
printf("ProcessRadioTxDone 3: rxWindT1=%d\n", MacCtx.RxWindowTimer1);
printf("ProcessRadioTxDone 4: RxWindow2Delay=%d, delay-off=%d\n", MacCtx.RxWindow2Delay, MacCtx.RxWindow2Delay - offset);
printf("ProcessRadioTxDone 5: rxWindT2=%d\n", MacCtx.RxWindowTimer2);
Left are debug prints when everything WORK, right is when it DOESN'T WORK.
I'm also attaching debug log file if you want to take a look (it is big, but you can start/look from end). E.g. Anything between "ProcessRadioTxDone 0 - START" and "TimerStart RxWindowTimer1 - DONE" is connected with setting timer for RX1. I even did some printings from "UTIL_TIMER_Start()".
I didn't find anything obivously wrong in this part, but I suspect that something behind this can be issue. Maybe some RTC value or something inside of IRQ handler where timing values are summed up.
Reason why I think this is timing issue is because halt happens every time after 4:40h (no matter if you send with 5min period or 10min period). Like some overflow happens (leaving it for another 4:40h doesn't release it).
Do you know if code use some other time source except GetTimerTicks()?
I don't have "modem.maintain()" call directly in my code, but as I can see it is called from endPacket() function as maintainUntilIdle();
which has do { maintain(); } while(busy());
And I can confirm that this is called because LoRaMacProcess();
function is called only inside of maintain() and you can see multiple "LoRaMacProcess" logs.
So your prints show offset=28
in both cases, so the potential offset underflow issue I mentioned in the other issue (which could result from not calling maintain()
quickly enough) is not the culprit here (it probably exists, but is not your issue).
I do not have any suggestions right away, but I'm a bit too tired now to really dive in. I'll see if I can find a bit of time to look at your debug log tomorrow (not guarantees, though...).
Thanks for taking the time to think about the problem. If you will have any ideas what to try, let me know, because I'm out of ideas now :)
I think we have lucky winner :)
In this RTC_StartAlarm line alarm is set up based on subSecond (ms) value. https://github.com/stm32duino/STM32LoRaWAN/blob/192c1234a8661f72f1635e5675db0d11fb110920/src/BSP/timer_if.c#L240
According to this: https://github.com/stm32duino/STM32RTC/blob/90414bcb87c41307b9a93b84a4ae589d6d621325/src/rtc.h#L195 subSeconds are uint32_t type which gives max value of 4 294 967 295.
As you can see from my logs, issue happens when processor tries to setup alarm with value of 4 310 527 ticks, and that value is multiplied by 1000. That should give value of 4 310 527 000 which is bigger than max value. That creates overflow in calulation so end result is wrong and alarm is set to some time value which RTC will not reach when it should.
This is commit which made this issue: https://github.com/stm32duino/STM32LoRaWAN/commit/54882c8ae24596983984e8f7e238b36ce03a8f91
UPDATE I've confirmed that issue is in subSeconds value. Test code:
uint32_t subloop = 0;
for (uint32_t i = 3500000; i < 4800000; i = i + 100000){
subloop = (i * 1000 / MS_TO_TICK + 1);
printf("TIMER_IF_StartTimer for loop timeout to subseconds: timeout=%d, subseconds=%d\n\r", i, subloop);
}
Test results:
TIMER_IF_StartTimer MS_TO_TICK=256
TIMER_IF_StartTimer for loop timeout to subseconds: timeout=3500000, subseconds=13671876
TIMER_IF_StartTimer for loop timeout to subseconds: timeout=3600000, subseconds=14062501
TIMER_IF_StartTimer for loop timeout to subseconds: timeout=3700000, subseconds=14453126
TIMER_IF_StartTimer for loop timeout to subseconds: timeout=3800000, subseconds=14843751
TIMER_IF_StartTimer for loop timeout to subseconds: timeout=3900000, subseconds=15234376
TIMER_IF_StartTimer for loop timeout to subseconds: timeout=4000000, subseconds=15625001
TIMER_IF_StartTimer for loop timeout to subseconds: timeout=4100000, subseconds=16015626
TIMER_IF_StartTimer for loop timeout to subseconds: timeout=4200000, subseconds=16406251
TIMER_IF_StartTimer for loop timeout to subseconds: timeout=4300000, subseconds=19660
TIMER_IF_StartTimer for loop timeout to subseconds: timeout=4400000, subseconds=410285
TIMER_IF_StartTimer for loop timeout to subseconds: timeout=4500000, subseconds=800910
TIMER_IF_StartTimer for loop timeout to subseconds: timeout=4600000, subseconds=1191535
TIMER_IF_StartTimer for loop timeout to subseconds: timeout=4700000, subseconds=1582160
During solving of this problem, I found new one. As mentioned earlier, problem with blocking every 4:40h was overflow in this line:
Solution for this problem is:
uint64_t subSecond_tmp = ((uint64_t)((uint64_t)timeout * (uint64_t)(1000))) / MS_TO_TICK + 1;
uint32_t subSeconds_start = (uint32_t)subSecond_tmp;
RTC_StartAlarm(RTC_ALARM_B, 0, 0, 0, 0, subSeconds_start, RTC_HOURFORMAT12_PM, 31UL);
We are using variable with bigger range because during calculations, our values can overflow in uint32_t. This solution is tested and ti works.
Second problem I found is "bomb" which will trigger in ~50 days and will block code execution.
Logic of processor is based on number of ticks which value can be between 0 and UINT32_MAX (4294967295). RTC_StartAlarm function (STM32RTC) require subseconds as parameter:
https://github.com/stm32duino/STM32RTC/blob/90414bcb87c41307b9a93b84a4ae589d6d621325/src/rtc.c#L801
That is reason why we are doing calculation *number_of_ticks(1000/MS_TO_TICK)* and this second part is equal to ~3.9. As mentioned earlier, number of ticks can get value up to UINT32_MAX. In that case subSeconds=UINT32_MAX 3.9 which can't fit in uin32_t variable!
Example: number_of_ticks=4 290 000 000 subSeconds = number_of_ticks*1000/256 = 16 757 812 500 (bigger than UINT32_MAX)
In RTC_StartAlarm function ti will send only 32 bits of value, not all 64 bits. So currently biggest value of ticks we can send is UINT32_MAX/3.9=1.101.273.665 (because subSeconds variable can't receive correctly values larger that UINT32_MAX)
In RTC_StartAlarm it is calculating subSeconds back to ticks:
https://github.com/stm32duino/STM32RTC/blob/90414bcb87c41307b9a93b84a4ae589d6d621325/src/rtc.c#L906
As it got wrong subSecond value, ticks will be calculated wrong and alarmB will be set wrongly.
Solution is to add uint64_t parameter to RTC_StartAlarm function so we can forward correct value of subSeconds (and ticks, because they are connected).
I added new line in rtc.h next to this line: https://github.com/stm32duino/STM32RTC/blob/90414bcb87c41307b9a93b84a4ae589d6d621325/src/rtc.h#L195
as it follows:
void RTC_StartAlarm2(alarm_t name, uint8_t day, uint8_t hours, uint8_t minutes, uint8_t seconds, uint32_t subSeconds, hourAM_PM_t period, uint8_t mask, uint64_t ticksToSubSeconds);
Note: Added uint64_t ticksToSubSeconds parameter
In rtc.c I changed this line:
https://github.com/stm32duino/STM32RTC/blob/90414bcb87c41307b9a93b84a4ae589d6d621325/src/rtc.c#L801
With this:
void RTC_StartAlarm(alarm_t name, uint8_t day, uint8_t hours, uint8_t minutes, uint8_t seconds, uint32_t subSeconds, hourAM_PM_t period, uint8_t mask)
{
RTC_StartAlarm2(name, day, hours, minutes, seconds, subSeconds, period, mask, NULL);
}
void RTC_StartAlarm2(alarm_t name, uint8_t day, uint8_t hours, uint8_t minutes, uint8_t seconds, uint32_t subSeconds, hourAM_PM_t period, uint8_t mask, uint64_t ticksToSubSeconds)
{
This logic enables that all library users who implemented this RTC_StartAlarm function don't need to change anything (no need to add new parameters). Maybe it is possible to set "...uint64_t ticksToSubSeconds=NULL)", without declaration of new function, but I didn't test that part.
And this line: https://github.com/stm32duino/STM32RTC/blob/90414bcb87c41307b9a93b84a4ae589d6d621325/src/rtc.c#L906
is changed with this:
if(ticksToSubSeconds != NULL){
uint64_t tmp = ((uint64_t)ticksToSubSeconds*(predivSync+1)) / (uint64_t)1000;
RTC_AlarmStructure.AlarmTime.SubSeconds = (uint32_t)UINT32_MAX - (uint32_t)tmp;
printf("STM32RTC rtc.c RTC_StartAlarm in ELSE() 2-12, alarm=%lu\n", (uint32_t)RTC_AlarmStructure.AlarmTime.SubSeconds);
}
else{
RTC_AlarmStructure.AlarmTime.SubSeconds = UINT32_MAX - (subSeconds * (predivSync + 1)) / 1000;
printf("STM32RTC rtc.c RTC_StartAlarm in ELSE() 2-3, subSeconds= %lu, ticks=%lu\n", subSeconds, RTC_AlarmStructure.AlarmTime.SubSeconds);
}
It is checking if 64 bits value is sent. If it is, it will use it, and if not, than regular 32 bit subSecond variable will be used.
And finally when I changed STM32RTC library, I made change in STM32LoRaWAN. This line: https://github.com/stm32duino/STM32LoRaWAN/blob/192c1234a8661f72f1635e5675db0d11fb110920/src/BSP/timer_if.c#L240
Is changed with this:
uint64_t subSecond_tmp = ((uint64_t)((uint64_t)timeout * (uint64_t)(1000))) / MS_TO_TICK + 1;
uint32_t subSeconds_start = (uint32_t)subSecond_tmp;
RTC_StartAlarm2(RTC_ALARM_B, 0, 0, 0, 0, subSeconds_start, RTC_HOURFORMAT12_PM, 31UL, subSecond_tmp);
Now we are calling RTC_StartAlarm2 (and not RTC_StartAlarm) because it can receive uint64_t value. subSeconds_start value is not important because it will not be used.
I've tested this solution with hardcoded value of 4 290 000 000 ticks. Here are printf from RTC_StartAlarm function:
STM32RTC rtc.c RTC_StartAlarm in ELSE() 2-13, tmp=4289999872
STM32RTC rtc.c RTC_StartAlarm in ELSE() 2-12, alarm=4967423
"tmp" is result of calculating received subseconds (ticksToSubSeconds) back to ticks, and "alarm" is new alarm. Difference between 4 290 000 000 and 4289999872 is because of precision I think. And UINT32_MAX-4289999872 really gives 4967423 as result.
EDIT: this difference in result 429... vs 428.. was due to wrong order of calculation (during debuging I changed to first divide by 1000, than multiply by 256). It should first multiply, than divide, as it was earlier.
@fpistm probably your assistance will be necessary in solving this here and in STM32RTC library. :)
@fpistm probably your assistance will be necessary in solving this here and in STM32RTC library. :)
Hi @slavendam, yes I will check that, thanks for the detailed explanation.
Hi @slavendam, Based on your proposal I raised
The main difference is that a SubSecond parameter is expressed on 64bit, instead of passing the uint32_t subSeconds plus the uint32_t ticksToSubSeconds.
Could you please check the conversion and if this fix the issue in the same way as yours
if (subSeconds > (uint64_t)UINT32_MAX) {
uint64_t tmp = ((uint64_t)subSeconds * (predivSync + 1)) / (uint64_t)1000;
RTC_AlarmStructure.AlarmTime.SubSeconds = (uint32_t)UINT32_MAX - (uint32_t)tmp;
} else {
RTC_AlarmStructure.AlarmTime.SubSeconds = (uint32_t)((uint32_t)UINT32_MAX - (uint32_t)(subSeconds * (predivSync + 1)) / 1000);
}
@slavendam Any feedback on this, please ?
Sorry for delay. I'll give you feedback in day or two
Thank you for all your work on this issue, I am experiencing the same issue and was relieved to find such a detailed investigation had already been undertaken. ❤
Unfortunately the PRs you mentioned above @FRASTM (https://github.com/stm32duino/STM32LoRaWAN/pull/39 and https://github.com/stm32duino/STM32RTC/pull/109) don't appear to resolve this issue for me. After applying these patches locally I continue to find myself stuck in STM32LoRaWAN::maintainUntilIdle
at approximately the 4 hour 40 minute mark.
I am a little stuck as to the next steps to try and resolve this, but happy to test other proposed fixes if there are any suggestions.
Thank you for all your work on this issue, I am experiencing the same issue and was relieved to find such a detailed investigation had already been undertaken. ❤
Unfortunately the PRs you mentioned above @FRASTM (#39 and stm32duino/STM32RTC#109) don't appear to resolve this issue for me. After applying these patches locally I continue to find myself stuck in
STM32LoRaWAN::maintainUntilIdle
at approximately the 4 hour 40 minute mark.I am a little stuck as to the next steps to try and resolve this, but happy to test other proposed fixes if there are any suggestions.
I have the same problem as you mentioned: getting stuck at maintainUntilIdle
after a wakeup from deepsleep which lasted 12 hours. This is really bad because I need the chip to go to sleep after sending data. I tried to workaround it by creating a function with a timeout that forces the chip to sleep if it gets stuck at maintainUntilIdle
. However, this didn't solve the problem because, on the next wakeup, the data is not sent. I am really frustrated by this issue, as it is the last problem I need to resolve to finish my personal project.
Any progress @fpistm @FRASTM ?
I also ran into this problem with my work project where I created a MBus to LoRa WAN Gateway. I have tried to apply multiple patches mentioned here but it seems nothing really works and testing all of this is really hard because I always have to wait almost five hours. I hope there will be some progress on this soon and I am curious if there is a different way to testing solutions than waiting.
Currently no progress. As we are not able to reproduce with Nucleo WL55. Seems related to other hardware setup.
Okay thank you for the update. I am using a Seeed Studio LoRa E5 mini Board wit the STM32WLE5JC Module on Board witch seems to be very similar to the STM32WL55JC1 Chip but maybe I am missing something here
so I ran into this issue too... using a custom board with a Seeed LoRa-E5-LE module. I cannot think of a hardware issue, but I will run my code on a Nucleo WL55. For me this looks like a show stopper as I am running out of time for my project. Is there any known workaround? For example, reset the RTC every 4 hours, or reset the LoRa Stack?
Hi @fpistm and @FRASTM ,
I ran into the 4:40 bug as well, have seen your PRs and that you are still waiting for feedback on that.
Could you please check the conversion and if this fix the issue in the same way as yours
I tracked down the issue and found the if-else-branch causing the problem. My PR to resolve it is here: https://github.com/stm32duino/STM32RTC/pull/113
Currently no progress. As we are not able to reproduce with Nucleo WL55. Seems related to other hardware setup. @fpistm I could reproduce this bug with my Nucleo WL55 board as well. In case you need my original code for this to test, please let me know.
@mrschuster thank you for this fix, it works for me!
Thanks @mrschuster for the fix and all for the test. New release of the STM32RTC done and fix this issue: https://github.com/stm32duino/STM32RTC/releases/tag/1.6.0
Describe the bug endPacket() is never finished after 4:40h of properly working. I've tested 3 times and it blocked at same time (4:40h). 2 times with 10min sending period between (both times it blocked at counter 29) and once tested with 5min sending period (same time period passed, but now it blocked on counter 57). I did debugging by adding multiple print logs in header files.
Sending is done successfuly, message arrives on gateway - and than it stops, like it is in infinite loop. RX1 window should be opened, but it seems that this part is never done. After LoRaMacProcess() function is done once (for sending), function OnRxWindow1TimerEvent() should be called - but it is never called.
I'm using RAK3172 module, EU868 region. I'm using STM32 Low Power library to send device in DeepSleep, and I'm using RTC Alarm timed wakeup. Even if it is stuck between TX and RX windows, this alarms work properly (I can see alarm triggers later). I tried to disable DeepSleep and alarms, and use delay instead, but result is the same.
Transmitted payload has 6 bytes.
To Reproduce Set device to send messages every 10min and wait for 29th message. Or set with 5min time period and wait for 57th message. That last message will be transmitted, endPacket() function will not be finished.
Expected behavior This is part of logs how it should looks like when it works correctly (note that 2nd section is missing later):
This is log when it stopped:
Has anyone had the same problem, or do you have idea about possible solution?
Thanks!