mcci-catena / arduino-lmic

LoraWAN-MAC-in-C library, adapted to run under the Arduino environment
https://forum.mcci.io/c/device-software/arduino-lmic/
MIT License
636 stars 207 forks source link

IN 865 SF12 downlink doesn't work with TTN #483

Open terrillmoore opened 4 years ago

terrillmoore commented 4 years ago

Per @svelmurugan92, this isn't working.

asprakash commented 4 years ago

Hi. Any updates on this downlink issue?. I am using loraserver.io (Chirpstack). The downlink is not working on TTN alone or both, Chirpstack & TTN?.

TIA.

terrillmoore commented 4 years ago

I tested with an RWC 5020A. I can't duplicate the problem with the compliance sketch. This says there's some other problem. Can you please share the exact sketch you use, and the configuration (OTAA, ABP, etc.). One thing to bear in mind: OTAA requires SF12 downlink to work, and the problem reported is with data downlink, not joins. Makes me think there's something else going on.

asprakash commented 4 years ago

Hi @terrillmoore Shall I assume, you are able to do downlink successfully with OTAA & TTN server?. However, I did not yet tried downlink. Moreover I am using ABP & Chirpstack. I will give a try and share my feedback.

terrillmoore commented 4 years ago

Hi @asprakash -- we are able to do OTAA & TTN server in India, at our Chennai office. You may contact @svelmurugan92 or @dhineshkumarmcci who are actively involved in this work.

We find that regular (RX1 and RX2) downlink after a class A uplink are not working, but there are many variables.

Some facts:

  1. With a network simulator (the RedwoodComm RWC5020), things work pretty well, but not perfectly. We have the known reliability problem with SF12, about 10% packet loss with the RWC5020A -- but this is not the same as "not working".

  2. With TTN, we can join without problem, but we have lots of evidence that regular downlink is not working.

The difference between JoinAccept and regular class A downlinks is that JoinAccept has 5 seconds minus some guardband to notify the gateway; regular class A downlinks have only one second minus some guardband. If the notification is late getting to the gateway, the gateway discards. We suspect that there's a slow network at our India office, JoinAccept arrives in time (because it has more time to do so); regular class A downlink arrives late.

I don't know what might be happening with chirpserver.io; we are not set up to use that. We do use several other networks beyond TTN and the RedwoodComm LoRaWAN tester/network simulator. In the US, I know of people using LMIC with chirpserver.io.

asprakash commented 4 years ago

Hi @terrillmoore Thanks for your brief inputs. I use chirpstack since TTN's FUP is not suitable for me. I also understand that MCCI is focusing on TTN than chirpstack.

May I know if I need to change anything on the ttn-abp code for IN866 BAND downlink?. I am also like to use Class C instead of A. Should I change anything on the MCCI LMIC Arduino library side ?.

I tried downlink from Chirpstack and using Class A on the arduino. I did not received any datas on the node side. I am using the sample ttn-abp code.

Below I have given my global_conf.json file and I am using RAK Wireless gateway.

`{ "SX1301_conf": { "lorawan_public": true, "clksrc": 1, "clksrc_desc": "radio_1 provides clock to concentrator for most devices except MultiTech. For MultiTech set to 0.", "antenna_gain": 0, "antenna_gain_desc": "antenna gain, in dBi", "radio_0": { "enable": true, "type": "SX1257", "freq": 865200000, "rssi_offset": -166.0, "tx_enable": true, "tx_freq_min": 865000000, "tx_freq_max": 867000000, "tx_notch_freq": 129000 }, "radio_1": { "enable": true, "type": "SX1257", "freq": 866385000, "rssi_offset": -166.0, "tx_enable": false }, "chan_multiSF_0": { "desc": "Lora MAC, 125kHz, all SF, 865.0625 MHz", "enable": true, "radio": 0, "if": -137500 }, "chan_multiSF_1": { "desc": "Lora MAC, 125kHz, all SF, 865.4025 MHz", "enable": true, "radio": 0, "if": 202500 }, "chan_multiSF_2": { "desc": "Lora MAC, 125kHz, all SF, 865.9850 MHz", "enable": true, "radio": 1, "if": -400000 }, "chan_multiSF_3": { "desc": "disabled", "enable": false }, "chan_multiSF_4": { "desc": "disabled", "enable": false }, "chan_multiSF_5": { "desc": "disabled", "enable": false }, "chan_multiSF_6": { "desc": "disabled", "enable": false }, "chan_multiSF_7": { "desc": "disabled", "enable": false }, "chan_Lora_std": { "desc": "disabled", "enable": false }, "chan_FSK": { "desc": "disabled", "enable": false }, "tx_lut_0": { "desc": "TX gain table, index 0", "pa_gain": 0, "mix_gain": 8, "rf_power": -6, "dig_gain": 0 }, "tx_lut_1": { "desc": "TX gain table, index 1", "pa_gain": 0, "mix_gain": 10, "rf_power": -3, "dig_gain": 0 }, "tx_lut_2": { "desc": "TX gain table, index 2", "pa_gain": 0, "mix_gain": 12, "rf_power": 0, "dig_gain": 0 }, "tx_lut_3": { "desc": "TX gain table, index 3", "pa_gain": 1, "mix_gain": 8, "rf_power": 3, "dig_gain": 0 }, "tx_lut_4": { "desc": "TX gain table, index 4", "pa_gain": 1, "mix_gain": 10, "rf_power": 6, "dig_gain": 0 }, "tx_lut_5": { "desc": "TX gain table, index 5", "pa_gain": 1, "mix_gain": 12, "rf_power": 10, "dig_gain": 0 }, "tx_lut_6": { "desc": "TX gain table, index 6", "pa_gain": 1, "mix_gain": 13, "rf_power": 11, "dig_gain": 0 }, "tx_lut_7": { "desc": "TX gain table, index 7", "pa_gain": 2, "mix_gain": 9, "rf_power": 12, "dig_gain": 0 }, "tx_lut_8": { "desc": "TX gain table, index 8", "pa_gain": 1, "mix_gain": 15, "rf_power": 13, "dig_gain": 0 }, "tx_lut_9": { "desc": "TX gain table, index 9", "pa_gain": 2, "mix_gain": 10, "rf_power": 14, "dig_gain": 0 }, "tx_lut_10": { "desc": "TX gain table, index 10", "pa_gain": 2, "mix_gain": 11, "rf_power": 16, "dig_gain": 0 }, "tx_lut_11": { "desc": "TX gain table, index 11", "pa_gain": 3, "mix_gain": 9, "rf_power": 20, "dig_gain": 0 }, "tx_lut_12": { "desc": "TX gain table, index 12", "pa_gain": 3, "mix_gain": 10, "rf_power": 23, "dig_gain": 0 }, "tx_lut_13": { "desc": "TX gain table, index 13", "pa_gain": 3, "mix_gain": 11, "rf_power": 25, "dig_gain": 0 }, "tx_lut_14": { "desc": "TX gain table, index 14", "pa_gain": 3, "mix_gain": 12, "rf_power": 26, "dig_gain": 0 }, "tx_lut_15": { "desc": "TX gain table, index 15", "pa_gain": 3, "mix_gain": 14, "rf_power": 27, "dig_gain": 0 } }, "gateway_conf": { "gateway_ID": "B827EBFFFE92CA25", / change with default server address/ports, or overwrite in local_conf.json / "server_address": "192.168.0.1", "serv_port_up": 1700, "serv_port_down": 1700, / adjust the following parameters for your network / "keepalive_interval": 10, "stat_interval": 30, "push_timeout_ms": 100, / forward only valid packets / "forward_crc_valid": true, "forward_crc_error": false, "forward_crc_disabled": false, / gps enable / "gps": true, "gps_tty_path": "/dev/ttyAMA0", "fake_gps": false, "ref_latitude": 10, "ref_longitude": 20, "ref_altitude": -1, "autoquit_threshold": 6 }

} `

TIA

asprakash commented 4 years ago

Also little bit confused on the downlink DR.

asprakash commented 4 years ago

Any kind of input is appreciated. @svelmurugan92 @dhineshkumarmcci

TIA

dhineshkumarmcci commented 4 years ago

Hi @asprakash sorry for the delay. MCCI LMIC README, uses EU868 as example and SF9 mentioned in README is for EU region. We kindly suggest you to use SF10 (DR2) for while using TTN. We tried using ttn-abp.ino in our end and we were able to receive the downlink with SF10. Added the LoRa log from Gateway:

 "lora": {
    "spreading_factor": 10,
    "bandwidth": 125,
    "air_time": 329728000
  },

Please let us know how it goes at your end.

asprakash commented 4 years ago

@dhineshkumarmcci Thanks for your inputs. I tried modifying the line LMIC.dn2Dr = DR_SF10 and LMIC.dn2Dr = DR_SF7 in ttn-abp.ino. Still I am not received any datas on the node side. I have received the datas on the gateway side, the gateway log is as follows,

JSON down: {"txpk":{"imme":false,"rfch":0,"powe":27,"ant":0,"brd":0,"tmst":3713891891,"freq":865.4025,"modu":"LORA","datr":"SF7BW125","codr":"4/5","ipol":true,"size":26,"data":"oEYfBCaFnGUDAAcAAQ9Djg9yjVvwCttIXqI="}}

Lorawan server side downlink frame details are shared here, https://ibb.co/XFTqvk6 https://ibb.co/x1FcFt3

The lorawan server uses SF7 for downlink. Should I use the same SF7 on node side?. But I already tried with SF7 on the node side. No data is received. Here a sample data is sent for every 2mins and waiting for data to be received from gateway.

The node log is as follows :

11816751: EV_TXCOMPLETE (includes waiting for RX windows) 15567116: Unknown event Packet queued 15708378: EV_TXCOMPLETE (includes waiting for RX windows) 19458746: Unknown event Packet queued 19600073: EV_TXCOMPLETE (includes waiting for RX windows) 23350438: Unknown event Packet queued 23491841: EV_TXCOMPLETE (includes waiting for RX windows) 27242206: Unknown event

TIA

terrillmoore commented 4 years ago

Right now MCCI can only support LoRaWAN-compliant operation. Please refer to section 2.10 of the LoRaWAN Regional Parameters 1.0.3, and make sure your network server is operating in a compliant way. The downlink speed must be set according to the uplink frequency chosen by the device. The device can only receive at one speed at a time, and it expects the network to follow the rules in this table:

image

Best regards, --Terry

asprakash commented 4 years ago

@terrillmoore Thanks for sharing more info.

terrillmoore commented 4 years ago

Very sorry, but I have no way of testing or confirming. Your understanding of Class A is not correct, devices must receive on RX1 and RX2. We don't support Class C, so that's not relevant. We have tested the library for LoRaWAN compliance with the RedwoodComm RWC5020 tester. When used with the LMIC compliance script in examples, it passes all the tests (apart from less than desirable bit-rate-error for RX on SF12, and lack of stable FSK support).

Bug reports are not a good place for tutorial. I suggest you contact my colleagues at MCCI India directly via portal.mcci.com; they may be able to help.

terrillmoore commented 4 years ago

Window Tests (requires fix #502, which is in v3.0.99.10)

Back to the investigation. My test with a precisely-timed 1 second SF12 packet reveals that there is strange behavior in the SX1276 -- there are windows which don't work, whereas both before and after do work.

The following log, which is quite repeatable, shows that the packet loss rate goes from zero to 30% and then back to zero as the RX start time varies from 1.005 s to 1.025s.

(One interesting thing -- when I scanned by half symbol units (16384 us), I didn't see any losses.)

It's possible that this is a bizarre artifact of my test setup. Obviously, more detailed investigations are needed.

Test setup

Test setup:

I saw https://github.com/ARMmbed/mbed-os/pull/8822/files which seems to attempt to address "problems at low data rates". Note that their comments claim that they now use a window offset of 88 ms for SF12 -- much after ours. Since our actual windows are earlier than 88ms, and there's not a lot of reported experimental data, my guess is that there are multiple "nodes" in the delay. I'll try running some experiments up at their delay to see if we get a similar result. But obviously starting too early sometimes causes problems.

Note: I already scanned lower delays than 1.004976, and all was well.

Result Summary

Delay Adjusted Packets good/total Packet Error Rate
1.004976 1.035744 20/20 0%
1.006224 1.036992 19/20 5%
1.007472 1.038240 19/20 5%
1.008720 1.039488 19/20 5%
1.009968 1.040736 18/20 10%
1.011216 1.041984 15/20 25%
1.012464 1.043232 15/20 25%
1.013712 1.044480 15/20 25%
1.014960 1.045728 14/20 30%
1.016208 1.046976 15/20 25%
1.017456 1.048224 18/20 10%
1.018704 1.049472 20/20 0%
1.019952 1.050720 19/20 5%
1.021200 1.051968 19/20 5%
1.022448 1.053216 20/20 0%
1023696 1.054464 20/20 0%
1024944 1.055712 20/20 0%

Next Steps

Raw Log

Freq=902300000 Hz, LoRa SF12, BW125, TxPwr=0 dB, CR 4/5, CRC=1, LBT=0 us/-80 dB, clockError=0.0 (0x0), RxSyms=6
Window 1004976 us: adjusted 1035744 us, hsym 1024 (16384 us) rxsyms 6 (196608 us)
Start RX Window test: vary window from 1004976 to 1024944 us in 1248 us steps, 20 tries each step
Rx triggered by digital input 12.
Set up second Catena and start tx loop. Use 'count' or 'q' to quit
OK
++++++++++++++++++++
window 1004976: received 20/20
Window 1006224 us: adjusted 1036992 us, hsym 1024 (16384 us) rxsyms 6 (196608 us)
++++++-+++++++++++++
window 1006224: received 19/20
Window 1007472 us: adjusted 1038240 us, hsym 1024 (16384 us) rxsyms 6 (196608 us)
+++++++++++-++++++++
window 1007472: received 19/20
Window 1008720 us: adjusted 1039488 us, hsym 1024 (16384 us) rxsyms 6 (196608 us)
++++++++++-+++++++++
window 1008720: received 19/20
Window 1009968 us: adjusted 1040736 us, hsym 1024 (16384 us) rxsyms 6 (196608 us)
++--++++++++++++++++
window 1009968: received 18/20
Window 1011216 us: adjusted 1041984 us, hsym 1024 (16384 us) rxsyms 6 (196608 us)
-++++-+++++++++---++
window 1011216: received 15/20
Window 1012464 us: adjusted 1043232 us, hsym 1024 (16384 us) rxsyms 6 (196608 us)
--+-+++++++-+++++-++
window 1012464: received 15/20
Window 1013712 us: adjusted 1044480 us, hsym 1024 (16384 us) rxsyms 6 (196608 us)
++++-+++-+-++++++-+-
window 1013712: received 15/20
Window 1014960 us: adjusted 1045728 us, hsym 1024 (16384 us) rxsyms 6 (196608 us)
++++++-++-+-+++-++--
window 1014960: received 14/20
Window 1016208 us: adjusted 1046976 us, hsym 1024 (16384 us) rxsyms 6 (196608 us)
+++++++++-++-++-+-+-
window 1016208: received 15/20
Window 1017456 us: adjusted 1048224 us, hsym 1024 (16384 us) rxsyms 6 (196608 us)
++++++++++++-++++++-
window 1017456: received 18/20
Window 1018704 us: adjusted 1049472 us, hsym 1024 (16384 us) rxsyms 6 (196608 us)
++++++++++++++++++++
window 1018704: received 20/20
Window 1019952 us: adjusted 1050720 us, hsym 1024 (16384 us) rxsyms 6 (196608 us)
++++++++++++-+++++++
window 1019952: received 19/20
Window 1021200 us: adjusted 1051968 us, hsym 1024 (16384 us) rxsyms 6 (196608 us)
+++++++++++++++-++++
window 1021200: received 19/20
Window 1022448 us: adjusted 1053216 us, hsym 1024 (16384 us) rxsyms 6 (196608 us)
++++++++++++++++++++
window 1022448: received 20/20
Window 1023696 us: adjusted 1054464 us, hsym 1024 (16384 us) rxsyms 6 (196608 us)
++++++++++++++++++++
window 1023696: received 20/20
Window 1024944 us: adjusted 1055712 us, hsym 1024 (16384 us) rxsyms 6 (196608 us)
++++++++++++++++++++
window 1024944: received 20/20
total: received 0/340
Idle
terrillmoore commented 4 years ago

Here's a graph of the result of a sweep with 100 messages per bin. Horizontal axis is symbols (multiply by 32.768 to get milliseconds) of delay relative to start-of-transmission time. Vertical axis is packet error rate. Increment was roughly 0.04 symbols.

image

Here's the test setup.

image

It took about 2.5 hours to run this test. I'll repeat this tonight, scanning from -1 symbol to +8 symbols, just to get a baseline. It's interesting that the packet loss curves in 2 and 4 are qualitatively similar to each other, and that 3 and 5 are both good. There are even/odd effects in many digital signal processing things. It's interesting that the curves in symbol 1 and symbol 2 are qualitatively similar (though starting in symbol 2 is less lossy than starting in symbol 1). I would not believe any one point, but the fact that all the data for symbol 2, taken together, looks like a less lossy image of symbol 1, combined with the nature of the errors in this kind of distribution makes me think that symbol 2 really is less lossy than symbol 1. We'll see from tonight's run.

RX testing indicates that opening the window before the transmit starts is very reliable. (This also matches class C operation, which I'm told works well.) This suggests that a possible fix for this problem is to open the window early, and bias the high clock error cases.

Note: updated after noticing my off-by-two error in the spreadsheet for calculating symbol time.

terrillmoore commented 4 years ago

The root cause of the above problem is grounding. The GND shown in the figure was provided by the common USB ground, not by a twisted pair. When I use a tethered connection, things work well:

image

Ditto a free-air connection with twisted pair:

image

But this setup has problems:

image

However, it can't be denied that units in the field have problems with SF12. The only way to really test in a similar way to field conditions is (1) run everything off batteries; (2) use an opto-coupler or similar isolation barrier for the sync pulse from the transmitter to receiver.