zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.49k stars 6.42k forks source link

LoRaWAN US915 NRF52840DK+sx1262 unable to TTN join #70889

Closed hallard closed 2 months ago

hallard commented 5 months ago

Describe the bug

When using US915 LoRaWAN class_a sample we're unable to join on TTN

First to mention, there is no subband (FSB) selection possible under kconfig (see FSB usage explanation) so as US915 has 64 channels and our gateway has only 8 channels, we miss about lot of joins request that are not in our subband (just as a reminder, not related to our issue)

Using NRF52840 + semtech_sx1262mb2das shield, tried also with RAK4631 same issue

To Reproduce

We changed main.c to loop for join to better track of this issue

    ret = -1;
    while (ret < 0) {
                // tried this one also just in case without any better success
            // join_cfg.otaa.dev_nonce = (uint16_t)sys_rand32_get();

        LOG_INF("Joining network over OTAA with devnonce %d", join_cfg.otaa.dev_nonce);
        ret = lorawan_join(&join_cfg);
        if (ret < 0) {
            LOG_ERR("lorawan_join_network failed: %d", ret);
            k_sleep(K_MSEC(1000));
        }
    }
west build -p -b nrf52840dk_nrf52840 ./zephyr/samples/subsys/lorawan/class_a/ -- -DSHIELD=semtech_sx1262mb2da

Expected behavior

Device to join LoRaWAN .

Impact

LoRaWAN not working US915 on TTN in our config (not sure on others LNS)

Logs and console output

*** Booting nRF Connect SDK v3.5.99-ncs1 ***
[00:00:00.339,813] <dbg> lorawan: lorawan_start: LoRaMAC Initialized
[00:00:00.339,965] <inf> lorawan_class_a: Joining network over OTAA with devnonce 0
[00:00:00.357,421] <dbg> lorawan: lorawan_join: Network join request sent!
[00:00:06.906,372] <dbg> lorawan: mlme_confirm_handler: Received MlmeConfirm (for MlmeRequest 1)
[00:00:06.906,402] <err> lorawan: MlmeConfirm failed : Rx 2 timeout
[00:00:06.908,508] <err> lorawan_class_a: lorawan_join_network failed: -116
[00:00:07.908,599] <inf> lorawan_class_a: Joining network over OTAA with devnonce 0
[00:00:07.923,858] <dbg> lorawan: lorawan_join: Network join request sent!
[00:00:14.471,710] <dbg> lorawan: mlme_confirm_handler: Received MlmeConfirm (for MlmeRequest 1)
[00:00:14.471,740] <err> lorawan: MlmeConfirm failed : Rx 2 timeout
[00:00:14.473,846] <err> lorawan_class_a: lorawan_join_network failed: -116
[00:00:15.473,937] <inf> lorawan_class_a: Joining network over OTAA with devnonce 0
[00:00:15.489,196] <dbg> lorawan: lorawan_join: Network join request sent!
[00:00:22.038,055] <dbg> lorawan: mlme_confirm_handler: Received MlmeConfirm (for MlmeRequest 1)
[00:00:22.038,085] <err> lorawan: MlmeConfirm failed : Rx 2 timeout
[00:00:22.040,191] <err> lorawan_class_a: lorawan_join_network failed: -116
[00:00:23.040,283] <inf> lorawan_class_a: Joining network over OTAA with devnonce 0
[00:00:23.055,572] <dbg> lorawan: lorawan_join: Network join request sent!
[00:00:29.604,522] <dbg> lorawan: mlme_confirm_handler: Received MlmeConfirm (for MlmeRequest 1)
[00:00:29.604,553] <err> lorawan: MlmeConfirm failed : Rx 2 timeout
[00:00:29.606,658] <err> lorawan_class_a: lorawan_join_network failed: -116
[00:00:30.606,750] <inf> lorawan_class_a: Joining network over OTAA with devnonce 0
[00:00:30.622,009] <dbg> lorawan: lorawan_join: Network join request sent!
[00:00:37.152,587] <dbg> lorawan: mlme_confirm_handler: Received MlmeConfirm (for MlmeRequest 1)
[00:00:37.152,618] <err> lorawan: MlmeConfirm failed : Rx 1 timeout
[00:00:37.152,648] <dbg> lorawan: mcps_indication_handler: Received McpsIndication 0
[00:00:37.152,679] <err> lorawan: McpsIndication failed : Error
[00:00:37.154,754] <err> lorawan_class_a: lorawan_join_network failed: -116
[00:00:38.154,876] <inf> lorawan_class_a: Joining network over OTAA with devnonce 0
[00:00:38.170,135] <dbg> lorawan: lorawan_join: Network join request sent!
[00:00:44.676,788] <dbg> lorawan: mlme_confirm_handler: Received MlmeConfirm (for MlmeRequest 1)
[00:00:44.676,818] <err> lorawan: MlmeConfirm failed : Rx 2 timeout
[00:00:44.678,924] <err> lorawan_class_a: lorawan_join_network failed: -116
[00:00:45.679,016] <inf> lorawan_class_a: Joining network over OTAA with devnonce 0
[00:00:45.694,274] <dbg> lorawan: lorawan_join: Network join request sent!
[00:00:52.243,133] <dbg> lorawan: mlme_confirm_handler: Received MlmeConfirm (for MlmeRequest 1)
[00:00:52.243,164] <err> lorawan: MlmeConfirm failed : Rx 2 timeout
[00:00:52.245,269] <err> lorawan_class_a: lorawan_join_network failed: -116
[00:00:53.245,361] <inf> lorawan_class_a: Joining network over OTAA with devnonce 0
[00:00:53.260,620] <dbg> lorawan: lorawan_join: Network join request sent!
[00:00:59.809,448] <dbg> lorawan: mlme_confirm_handler: Received MlmeConfirm (for MlmeRequest 1)
[00:00:59.809,478] <err> lorawan: MlmeConfirm failed : Rx 2 timeout
[00:00:59.811,584] <err> lorawan_class_a: lorawan_join_network failed: -116
[00:01:00.811,676] <inf> lorawan_class_a: Joining network over OTAA with devnonce 0
[00:01:00.826,904] <dbg> lorawan: lorawan_join: Network join request sent!
[00:01:07.033,355] <dbg> lorawan: mlme_confirm_handler: Received MlmeConfirm (for MlmeRequest 1)
[00:01:07.033,386] <err> lorawan: MlmeConfirm failed : Rx 2 timeout
[00:01:07.035,491] <err> lorawan_class_a: lorawan_join_network failed: -116
[00:01:08.035,583] <inf> lorawan_class_a: Joining network over OTAA with devnonce 0
[00:01:08.050,842] <dbg> lorawan: lorawan_join: Network join request sent!
[00:01:14.599,670] <dbg> lorawan: mlme_confirm_handler: Received MlmeConfirm (for MlmeRequest 1)
[00:01:14.599,700] <err> lorawan: MlmeConfirm failed : Rx 2 timeout
[00:01:14.601,806] <err> lorawan_class_a: lorawan_join_network failed: -116
[00:01:15.601,898] <inf> lorawan_class_a: Joining network over OTAA with devnonce 0
[00:01:15.617,156] <dbg> lorawan: lorawan_join: Network join request sent!
[00:01:22.165,985] <dbg> lorawan: mlme_confirm_handler: Received MlmeConfirm (for MlmeRequest 1)
[00:01:22.166,015] <err> lorawan: MlmeConfirm failed : Rx 2 timeout
[00:01:22.168,121] <err> lorawan_class_a: lorawan_join_network failed: -116
[00:01:23.168,212] <inf> lorawan_class_a: Joining network over OTAA with devnonce 0
[00:01:23.183,471] <dbg> lorawan: lorawan_join: Network join request sent!
[00:01:29.732,330] <dbg> lorawan: mlme_confirm_handler: Received MlmeConfirm (for MlmeRequest 1)
[00:01:29.732,360] <err> lorawan: MlmeConfirm failed : Rx 2 timeout
[00:01:29.734,466] <err> lorawan_class_a: lorawan_join_network failed: -116
[00:01:30.734,558] <inf> lorawan_class_a: Joining network over OTAA with devnonce 0
[00:01:30.749,816] <dbg> lorawan: lorawan_join: Network join request sent!

Gateway Logs

As you can see only 2 of the 14 joins are seen, it's because of FSB mentioned above.

image

TN LOG

image

Additional context

We have some RAK3172 (STM32WL5) we are using with mbed for long time and we tried on this one to check our infrastructure and also tried with EU868

As you can see the issue is only happening on US915 with Zephyr.

We also tried to increase RX error from 20 to 100 just in case with CONFIG_LORAWAN_SYSTEM_MAX_RX_ERROR=100 not solving the issue.

Any help would be greatly appreciated @mluis1 @martinjaeger @carlescufi , FTI we already made some first investigations with @kartben

Thanks

github-actions[bot] commented 5 months ago

Hi @hallard! We appreciate you submitting your first issue for our open-source project. šŸŒŸ

Even though I'm a bot, I can assure you that the whole community is genuinely grateful for your time and effort. šŸ¤–šŸ’™

Ashu12Aj commented 5 months ago

can you help me to make grove LCD Display program i face configuration essue can you give me overlay file for this

kartben commented 5 months ago

can you help me to make grove LCD Display program i face configuration essue can you give me overlay file for this

@Ashu12Aj please ask your question on Discord (https://chat.zephyrproject.org/) -- this is an entirely different issue you've commented on, so I doubt anyone will be able to help you here :)

mluis1 commented 5 months ago

@hallard The LoRaWAN US915 region has been designed/specified to use 64x125kHz+8x500kHz channels. Which means that so called 64 channels gateways are are expected to be used. The reasons for this requirement are related to the FCC rules. The FCC allows to use less channels but in this case the RF output levels must be decreased. For further information please refer to the following Regional Parameters RP2-1.0.4 specification note: image

The issue being that most network providers in US based regions have decided to deploy 8x125kHz+1x500kHz channels gateways for different reasons (mainly costs).

By doing so one needs to accept the compromises that must taken in consideration when such choice happens.

From a certification point of view as well as network operation all end-devices must support the 64+8 channels, The reason being that your today's network makes use of 8 channels gateways however in the future this may change i.e. 64 channels gateways are added to the network or for some reason the 8 channels gateways are changed to use a different bank of 8 channels or your end-device is at some point connected to a network using the 2nd block of 8 channels and then moves to a network using another bank of channels.

The drawback of using 8 channel gateways is that we must accept that an end-device must potentially do several trials before being heard by a gateway and finalize the join procedure. However once the join procedure is finalized the network server instructs the end-device to only activate the required channels using the CFList field of the Join Accept frame.

To speed up and reduce the amount of join trials the RP2-1.0.4 specification recommends the below algorithm. image

As an example lets assume that the network uses the 8th bank of 8 channels. This means that at least 8 trials will be necessary prior to join the network.

In conclusion I would say that your observed behavior looks to be normal. Another point being that RF signals can be interfered by other signals which if you are unlucky and your first join request supposed to be received isn't then 9 other trials will take place before being joined and this process may repeat up until the join procedure is successful.

As a side comment the RSSI values shown in your screenshots are way to high. Your end-device is shouting out loud on the receiver. I would recommend to move your end-device away from the gateway (at least 10 meters and a wall). For good conditions try to get RSSI values between -70dBm and -90dBm. By doing so I am confident that you will observe a lot less issues joining the network.

hallard commented 5 months ago

@mluis1, thank you very much for this crystal clear explanation about joining process on 64+8 channels. As I said, it's a deal (longer join process) that we can handle because once joined all should be fine.

I'm surprised that the issue may comes from end device too close to gateway (that's indeed the case) because it's far for the first time we're using US 915 devices in same situation (and lot of devices are different, even one already made from Elsys or other manufacturers) and it is the first time we're just unable to join. We already join with same device (with mbed lorawan stack) without any issue and even sometime with RSSI signal near -40dB (because as you say, we're close from GW). So of course I will test going far away from the GW but I'm pretty sure it will not change anything because it is something we handle day by day.

Here the logs and capture I've just tested few minutes ago in same situation, same location, same sensor, same gateway, mbed stack US915 (instead of zephyr one), as you can see works immediately even with -56 RSSI, same process with lorawan class a zephyr sample just never join.

[LoRaWAN] initialize: Success
[LoRaWAN] set callbacks: Success
[LoRaWAN] device set to class A: Success
[LoRaWAN] LoRa set_confirmed_msg_retries(3) : Success
MBED  join wait  : 3 seconds (random wait)
LoRa connect() joining...
Reset if no join in 24 h
LoRa Event joined
[LoRaWAN] enable adaptive datarate: Success
LoRa scheduled payload[1] on port:1 => 00 
LoRa cb_set_battery_level(3048mV) for 2x1.5V AA battery => F8 (97%)
LoRa Event TX done

image

image

So if someone would be able to do same test trying to connect nrf52840dk+semtech SX1262 shield on TTN it would be awesome we can confirm it just works (or not)

Anyway there is something different, mbed try to join in SF8 and zephyr in SF10 (that may be too high for RSSI as you mention, I'll test far from gw in a couple of hours)

hallard commented 5 months ago

I've done other tests according @mluis1 recommendation, I'm able going further because now TTN accept join and send downlink join but device still does not join

here TTN log image

and Serial console

image

please note to enforce testing LORAWAN_SYSTEM_MAX_RX_ERROR is set to 100 instead of 20

Once setting back to 20 looks like I had a chance to get further (one out of 100) but still no join. And with the following error

image

hallard commented 5 months ago

Hi

@mluis1 just to give more informations to investigate, today I decided to give a try with nrf52840dk + semtech shield with mbed (so same hardware, same infrastructure) and guess what, it worked 1st time without any issue.

image

github-actions[bot] commented 3 months ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.