mcci-catena / arduino-lmic

LoraWAN-MAC-in-C library, adapted to run under the Arduino environment
https://forum.mcci.io/c/device-software/arduino-lmic/
MIT License
642 stars 210 forks source link

RX Timing is wrong for SF12 #442

Closed terrillmoore closed 4 years ago

terrillmoore commented 5 years ago

LoRaWAN MACs calculate a start time for starting the receiver.

It looks like SF12 is wrong for the LMIC -- too late. I found this with EU RX testing. Here are the times in ms.

Code SF12 SF11 SF10 SF9 SF8 SF7
LMIC 41 8 -8 -16 -20 -22
Semtech reference code 33 17 5 -2 -8 -9
mbed 88 39 14 2 -4 -7

(Semtech reference: RegionCommon.c RegionCommonComputeRxWindowParameters()).

It's OK if you start early, but not OK if you start late, as you can miss the start of the packet.

In fact, the current LMIC only shows problems at SF12.

Part of the problem is that both use a specific number of symbols as "min syms". But LMIC uses 5 whereas Semtech uses 6. If LMIC uses 6, the table becomes:

Code SF12 SF11 SF10 SF9 SF8 SF7
LMIC 41 8 -8 -16 -20 -22
Semtech reference code 33 17 5 -2 -8 -9
mbed 88 39 14 2 -4 -7
LMIC with min RxSym = 6 29 -3 -19 -27 -32 -34

We'll test tightening this up, but also will test just changing RxSym to 6.

terrillmoore commented 4 years ago

It appears that there's more to this than meets the eye. See https://github.com/mcci-catena/arduino-lmic/issues/483#issuecomment-568788471_, #311 and #477.

terrillmoore commented 4 years ago

More testing revealed that grounding (and ground problems) can mess up the SX1276 front end (no big surprise). See new note at https://github.com/mcci-catena/arduino-lmic/issues/483#issuecomment-569375963.

However, even with crummy grounds, it was clear that starting RX before TX was better than starting while TX. As far as I can tell, there is no power advantage to starting late, unless there is a downlink. If there's a downlink, then starting late means you run the receiver for less time just watching the preamble. However, if there's no downlink, it doesn't matter at all. And most of the time, there's no downlink.

So I plan to push a change to the 3.0.99 branch for test, that will move the RX window to start before the TX window. We'll set RXSYMs high enough to allow us to be sure we have some overlap with the preamble. Even at SF7, 1023 symbols is over 1 second, so we have plenty of margin (as long as we apply the fix for #467). I think we should ignore user-specified clock inaccuracy of more than 1%; and we should track the number of times we miss deadlines. (We should schedule assuming a slow clock, and set the number of symbols assuming a fast clock.) But our target should be just before the tx time.

cyberman54 commented 4 years ago

Sounds promising. Hope we get rid of the EU868 TTN join problem this way.

terrillmoore commented 4 years ago

@cyberman54 I'll have something for others to test later today, I think.

cyberman54 commented 4 years ago

@terrillmoore i will be standby

terrillmoore commented 4 years ago

Here's the plan.

  1. I will add a check for "late" window opening (i.e., a call with the target receive time in the past). This will record a few statistics in LMIC.
  2. we'll change strategy to bias towards opening the window promptly.
  3. we'll deprecate the clock error because... it was probably compensating for a non-problem. (I'll leave a conditional compile to enable it, but... it really should not be used.)
terrillmoore commented 4 years ago

Status: I discovered that the delayMicroseconds() function on at least the MCCI BSP is often too short. After I moved the window forward, this was causing us to open the window too early, breaking SF7. I also discovered that the STM32L0 internal clock can only be calibrated (typical) to +/- 0.4%. This means that a clock error of 4000ppm is needed on those platforms, so I relaxed the constraint. This also revealed a calculation error for the number of syms needed; I was off by roughly a factor of 2 in the time requirement, which means a lot at the higher rates.

I have patches for all this, but I want to run the compliance test and review patches before committing. However, I think all these things (plus the 'late window' strategy) explain the problems we've been seeing. On the STM32L0, specifically, we might be better off using LPTIM and the (crystal) LSE oscillator.

Compliance is passing at faster rates. We'll see how things go at the slower rates in an hour or two.

terrillmoore commented 4 years ago

Looks decent for EU868. I will review and push changes tomorrow.

terrillmoore commented 4 years ago

Changes are pushed to issue453 branch, I will merge to master once CI tests pass. Looks good for EU868. Passes RWC pre-compliance test.

cyberman54 commented 4 years ago

@terrillmoore i can't find a branch named issue453 ? Edit: oh, i just see it was already merged with master. Thank you! I will test and report.

terrillmoore commented 4 years ago

Thanks in advance for testing.

cyberman54 commented 4 years ago

Here's my feedback, late due to holidays.

(+) with current head (commit #3ca90f3 ) i can connect to EU868 TTNv2 network with all my test boards (all ESP32 based) and my paxcounter application. (-) i still sometimes encounter "JOIN_WAIT" situations, i guess slightly less times than with previous mcci lmic version.

Unfortunately i don't own suitable measurement equipment to track this down. So this could be a local gateway or a general EU868 ttn problem, as well as a timing problem related to my ESP32 multitasking application.

I traced the value of LMIC.radio.rxlate_count in my application: it is always 0.