mcci-catena / arduino-lmic

LoraWAN-MAC-in-C library, adapted to run under the Arduino environment
https://forum.mcci.io/c/device-software/arduino-lmic/
MIT License
642 stars 210 forks source link

Device is failing to send on high data rates even though its close to the gateway #590

Closed dolfandringa closed 4 years ago

dolfandringa commented 4 years ago

Describe your question or issue I have a custom STM32F103 board that I use with an RFM95 chip and arduino on the US915 plan. The gateway is 2-3m away from the node and is an orange pi zero with a rakk2245 concentrator running chirpstack gatewa-bridge and the lora packet forwarder that rakk supplies with the 2245.

I have turned on ADR on the node and use OTAA. When the node sends data, it fails to send though on the higher data rates. The gateway sends it LinkADRReq messages to use data rate 4 (500khz/sf8), but then when the node sends the next message, it fails to send on dr4 and steps down the datarate gradually until it succeeds in sending. After a successful send, the gateway often tells it to switch back to DR4 again. And the poll messages that get sent by the node as a response to the LinkADRReq messages, do arrive fine and are sent on DR4. So I wonder why it is failing to send the new messages on DR4? Could there be a software (timing or something) issue that causes this? On my RFExplorer, I only see high peaks for some of the send attempts, but not on all. And even the ones that do cause high peaks, at least don't registered as being sent successfully straight away (more send attempts follow). In the sample output below there are peaks visible in RFExplorer at 16:33:11 and 16:33:24 but nothing really before or after that.

Below I have copied the output of my node that logs changes in frequency, opmode, channel, datarate, etc and TXSTART/TXCOMPLETE events that get triggered. The POLL messages are responses to MAC commands from the gateway.

The two screenshots from the peaks in RFexplorer are attached, and I also added the saved RFExplorer output if someone happens to have RFTouch and is interested.

Environment

This information is all important; it's hard to help without a complete set of answers.

Screenshot from 2020-07-18 16-39-09 Screenshot from 2020-07-18 16-39-23 RFExplorer_SweepData_2020_07_18_16_34_26.zip

[2020-07-18 16:33:01] Button pressed
[2020-07-18 16:33:01] Sending message
[2020-07-18 16:33:01] LoraWAN wakeup
[2020-07-18 16:33:01] SEND
[2020-07-18 16:33:01] 1724812: EV_TXSTART regular
[2020-07-18 16:33:01] LMIC.freq: 904600000
[2020-07-18 16:33:01] LMIC.opmode: 888
[2020-07-18 16:33:01] LMIC.freq: 923900000
[2020-07-18 16:33:03] LMIC.freq: 923300000
[2020-07-18 16:33:03] LMIC.opmode: 108
[2020-07-18 16:33:04] 1838498: EV_TXSTART regular
[2020-07-18 16:33:04] LMIC.freq: 904600000
[2020-07-18 16:33:04] LMIC.opmode: 888
[2020-07-18 16:33:04] LMIC.freq: 923900000
[2020-07-18 16:33:06] LMIC.freq: 923300000
[2020-07-18 16:33:06] LMIC.txChnl: 9
[2020-07-18 16:33:06] LMIC.datarate: 3
[2020-07-18 16:33:06] LMIC.opmode: 108
[2020-07-18 16:33:07] 1906717: EV_TXSTART regular
[2020-07-18 16:33:07] LMIC.freq: 904100000
[2020-07-18 16:33:07] LMIC.txpow: 21
[2020-07-18 16:33:07] LMIC.opmode: 888
[2020-07-18 16:33:07] LMIC.freq: 923900000
[2020-07-18 16:33:09] LMIC.freq: 923300000
[2020-07-18 16:33:09] LMIC.txChnl: 5
[2020-07-18 16:33:09] LMIC.opmode: 108
[2020-07-18 16:33:11] 2036592: EV_TXSTART regular
[2020-07-18 16:33:11] LMIC.freq: 903300000
[2020-07-18 16:33:11] LMIC.opmode: 888
[2020-07-18 16:33:11] LMIC.freq: 926300000
[2020-07-18 16:33:13] LMIC.freq: 923300000
[2020-07-18 16:33:13] LMIC.txChnl: 3
[2020-07-18 16:33:13] LMIC.datarate: 2
[2020-07-18 16:33:13] LMIC.opmode: 108
[2020-07-18 16:33:13] 2122030: EV_TXSTART regular
[2020-07-18 16:33:13] LMIC.freq: 902900000
[2020-07-18 16:33:13] LMIC.opmode: 888
[2020-07-18 16:33:14] LMIC.freq: 925100000
[2020-07-18 16:33:15] 2157090: EV_TXCOMPLETE
[2020-07-18 16:33:15] dataLen: 0
[2020-07-18 16:33:15] LMIC.txChnl: 65
[2020-07-18 16:33:15] LMIC.datarate: 4
[2020-07-18 16:33:15] LMIC.opmode: 110
[2020-07-18 16:33:15] 2172491: EV_TXSTART POLL
[2020-07-18 16:33:15] LMIC.freq: 904600000
[2020-07-18 16:33:15] LMIC.txpow: 26
[2020-07-18 16:33:15] LMIC.opmode: 880
[2020-07-18 16:33:15] LMIC.freq: 923900000
[2020-07-18 16:33:17] LMIC.freq: 923300000
[2020-07-18 16:33:17] 2237513: EV_TXCOMPLETE
[2020-07-18 16:33:17] dataLen: 0
[2020-07-18 16:33:17] LMIC.opmode: 900
[2020-07-18 16:33:17] EV_TXCOMPLETE.
[2020-07-18 16:33:17] LoraWAN wakeup
[2020-07-18 16:33:17] SEND
[2020-07-18 16:33:17] LMIC.opmode: 108
[2020-07-18 16:33:18] 2252928: EV_TXSTART regular
[2020-07-18 16:33:18] LMIC.freq: 904600000
[2020-07-18 16:33:18] LMIC.opmode: 888
[2020-07-18 16:33:18] LMIC.freq: 923900000
[2020-07-18 16:33:20] LMIC.freq: 923300000
[2020-07-18 16:33:20] LMIC.opmode: 108
[2020-07-18 16:33:21] 2353772: EV_TXSTART regular
[2020-07-18 16:33:21] LMIC.freq: 904600000
[2020-07-18 16:33:21] LMIC.opmode: 888
[2020-07-18 16:33:21] LMIC.freq: 923900000
[2020-07-18 16:33:23] LMIC.freq: 923300000
[2020-07-18 16:33:23] LMIC.txChnl: 12
[2020-07-18 16:33:23] LMIC.datarate: 3
[2020-07-18 16:33:23] LMIC.opmode: 108
[2020-07-18 16:33:24] 2454522: EV_TXSTART regular
[2020-07-18 16:33:24] LMIC.freq: 904700000
[2020-07-18 16:33:24] LMIC.txpow: 21
[2020-07-18 16:33:24] LMIC.opmode: 888
[2020-07-18 16:33:24] LMIC.freq: 925700000
[2020-07-18 16:33:25] 2488488: EV_TXCOMPLETE
[2020-07-18 16:33:25] dataLen: 51
[2020-07-18 16:33:25] 2488491: message received
[2020-07-18 16:33:25] LMIC.txChnl: 65
[2020-07-18 16:33:25] LMIC.datarate: 4
[2020-07-18 16:33:25] LMIC.opmode: 110
[2020-07-18 16:33:26] 2503835: EV_TXSTART POLL
[2020-07-18 16:33:26] LMIC.freq: 904600000
[2020-07-18 16:33:26] LMIC.txpow: 26
[2020-07-18 16:33:26] LMIC.opmode: 880
[2020-07-18 16:33:26] LMIC.freq: 923900000
[2020-07-18 16:33:28] LMIC.freq: 923300000
[2020-07-18 16:33:28] 2568857: EV_TXCOMPLETE
[2020-07-18 16:33:28] dataLen: 0
[2020-07-18 16:33:28] LMIC.opmode: 900
[2020-07-18 16:33:28] Received: id: d158a9d9, timestamp: 2020-07-18T16:33:14.702834
terrillmoore commented 4 years ago

[I really need to take the debug prints out of the LMIC -- they are a snare for those new to the code. The debug prints are only for debugging the LMIC, not for debugging your system. You have a system problem. LMIC prints are not useful for finding problems, in fact they make other problems crop up..]

Almost certainly because of your debug prints, you are not receiving downlinks. Debug prints while the LMIC is operating change the timing and the LMIC is not designed to detect that or recover from it.

Please use the compliance sketch in the examples, possibly modified if you like to up how often the dummy app transmits, and try that. It uses a logging technique, and prints when time is not critical to the LMIC.

Thanks, --Terry

dolfandringa commented 4 years ago

Sorry, I wan't clear there. Those debug prints are my own in my arduino loop function. Not the LMIC debug prints. So it is still calling the os_runloop_once() frequently. The only thing I can think of is that the os_runloop_once doesn't get called often enough if the arduino loop() takes too long? I can kick them out and see what happens then.

dolfandringa commented 4 years ago

Ok, my question, and hence your answer was partially correct but besides you helped me get my head on straight by forcing me to understand the lorawan frequency plans and chirpstack config better. I did probably have some timing issues which caused some of the tries to send data not to succeed due to delays in os_runloop_once. But more importantly, it also turns out I had bad configuration on the chirpstack side. The channels weren't configured properly in the network server, so it was telling the device to use channels that it actually couldn't. After cleaning up my code to eliminate debug print statements and fixing the chirpstack config, my device now triggers a whole bunch of mac messages at the first message it sends, after which it consistently sends at DR4 and succeeds in sending in around 1 second consistently without additional LinkADRReq messages being triggered.

For others reading this: Make sure you understand the regional parameters for your region (https://lora-alliance.org/sites/default/files/2018-07/lorawan_regional_parameters_v1.0.3reva_0.pdf) properly. I had to configure channels 0-7 (multiSF bandwidth 125kHz freq 902.3-903.7MHz) and channel 64 (sf 8, bw 500kHz, freq 903.0MHz) in chirpstack network server and make sure my lora packet forwarder corresponded with it. For arduino LMIC that means you use subband 0 for LMIC_selectSubBand. If you use another subband, the channel numbers will move up (there are 64 multisf channels and 8 sf8/500kHz channels, so 8 subbands). For chirpstack that also means you need to set the min-max datarate from 0-4 (0-3 is channels 0-7 and dr4 is channel 64).

terrillmoore commented 4 years ago

Glad you have it working! BTW, one of the characteristic things about ChirpStack (compared to other networks) when I used it was that it always set the uplink repeat count to 3 immediately after join, and then backs off slowly. This results in a lot more uplink traffic, and the LMIC stays busy longer when you do a send. Not a problem if you're prepared for it but the sample code really doesn't teach you how to deal with this.