manuelbl / ttn-esp32

The Things Network device library for ESP32 (ESP-IDF) and SX127x based devices
MIT License
301 stars 64 forks source link

Joining infinite loop #37

Open BryanMM opened 3 years ago

BryanMM commented 3 years ago

Greetings, i've been trying to use ur library for quite some time but i keep getting stuck at a joining infinite loop. Sometimes it connects once and starts sending msgs but they either usually get rejected by TTN (v2) or they never reach their platform at all. I've checked all the keys and reinstalled the component countless times but it doesnt help at all. The board i'm currently working with is heltec's wireless stick lite. Any help about my issue or advice would be appreciated. Thanks. image

manuelbl commented 3 years ago

This output doesn't look familiar. I've never seen such a loop. The most suspicious part is the transmission right after the join. It's still within the window of the first join so it could easily confuse the network and reset the successful join.

Are you using some sample code for this test? If not, can you post the code.

And what region are you in?

BryanMM commented 3 years ago

Hi, thanks for the answer. I'm currently using the north american region, and yes, i'm currently testing with the hello world's sample code, i tried with a different gateway (TTIG) with v2 and i got better results, with the only difference being that the first 3 msgs after the join are sent with no payload. I saw at the pull request within the git that there's an issue with 8ch gateways and before i was using heltec's ht-m01. Maybe that's the issue?

manuelbl commented 3 years ago

The 8 channel limitation could indeed be an issue. The good news is that the underlying LMIC library has just released a new version, which supposedly improves the channel handling for regions like US. I will soon integrate the new version. Unfortunately, I can't test it as I'm in Europe and don't have a lab to simulate the US region.

BryanMM commented 3 years ago

I implemented the changes within the pull request that i saw here, that's how i got the TTIG (who's also an 8ch gw) so as u said i think the issue must be around that topic. The ht-m01 is not yet tho. If needed once the implementation is done, i can help with the testing.

DylanGWork commented 3 years ago

Hi, I am having the same issue raised here and using the Hello_World example. I am using AU915.

The code hangs at this line: xQueueReceive(lmicEventQueue, &event, portMAX_DELAY); in the TheThingsNetwork.cpp file. I have managed to get two packets to send but I don't believe it was a result of any changes I made, going by this discussion.

Is it possible to lock in a DR? (SF7BW250) and lock in a channel (In case I use a single channel gateway).

Cheers

manuelbl commented 3 years ago

@DylanGWork It's currently not possible to set DR. That's planned though. There are no plans however for single channel operation.

manuelbl commented 2 years ago

An upcoming version will support a few changes relevant for you (@BryanMM and @DylanGWork):

The changes are in the master branch. I would appreciate if you give it a try.

BryanMM commented 2 years ago

@manuelbl Got it!, i'll be testing it soon.

DylanGWork commented 2 years ago

@manuelbl Tested and working great, is C version new too? I recall only having a C++ version?

Great work!

manuelbl commented 2 years ago

@DylanGWork Thanks for testing. Yes, the C version is new too.

BryanMM commented 2 years ago

@manuelbl I've been testing it out too and did render great results, i no longer needed to perform workarounds with the initial message (usually the first and second uplink bounces till a third one is sent and it has some probability of failure from there onwards). Tested the band selection and spread factor's functions and also worked great. Good job man.

manuelbl commented 2 years ago

@BryanMM Cool. Thanks for testing.

maizezoidberg commented 2 years ago

@manuelbl I have the same problem with an infinite loop when join. I am using the ttn_join_provisioned () method to connect. If the gate is enabled, then the method successfully returns true, but if the gate is disabled, or is out of reach of the device, then I never get false and the method is in a blocked state. Help me understand under what conditions ttn_join_provisioned () should return false?

I (1258) ttn_prov: DevEUI, AppEUI/JoinEUI and AppKey saved in NVS storage I (8472) ttn: event EV_JOINING I (8534) ttn: event EV_TXSTART I (13569) ttn: event EV_RXSTART I (14565) ttn: event EV_RXSTART I (14839) ttn: event EV_JOIN_TXCOMPLETE I (78616) ttn: event EV_TXSTART I (83650) ttn: event EV_RXSTART I (84646) ttn: event EV_RXSTART I (84920) ttn: event EV_JOIN_TXCOMPLETE I (149551) ttn: event EV_TXSTART I (154585) ttn: event EV_RXSTART I (155581) ttn: event EV_RXSTART I (155855) ttn: event EV_JOIN_TXCOMPLETE I (226413) ttn: event EV_TXSTART I (231498) ttn: event EV_RXSTART I (232494) ttn: event EV_RXSTART I (232768) ttn: event EV_JOIN_TXCOMPLETE I (356974) ttn: event EV_TXSTART I (362060) ttn: event EV_RXSTART I (363056) ttn: event EV_RXSTART I (363330) ttn: event EV_JOIN_TXCOMPLETE I (483475) ttn: event EV_TXSTART I (488560) ttn: event EV_RXSTART I (489556) ttn: event EV_RXSTART I (489830) ttn: event EV_JOIN_TXCOMPLETE

manuelbl commented 2 years ago

@maizezoidberg That's a good question indeed. The ttn_join() and similar functions mainly return false if no provisioning keys have been provided or they are invalid. If the device cannot immediately join, it will continue to try it. In particular, the spreading factor will also be increase in order to improve the chances of contacting a gateway. As the spreading factor is increased, the time between retries is also increased. I'm not sure if it ever gives up and returns false. Probably not.

How could we improve the library? Should we add a timeout parameter to the ttn_join() functions? If so, a realistic timeout is 10 minutes or more. Or should the function be changed to be asynchronous? It would make it easier to handle the error case but more difficult to handle the regular case.

maizezoidberg commented 2 years ago

@manuelbl, Thanks for your quick response. In fact, the LMIC follows the https://www.thethingsnetwork.org/docs/devices/bestpractices/ specification for best practices. The device should use JOIN very rarely. Considering that ESP32 does not have very low power consumption during operation, we can set the "use_continuous_join" flag in the ttn_join() method, and if this flag is NOT set, look at getting EV_JOIN_TXCOMPLETE (means that "JOIN" in the response from Gate is NOT received) and return an error in the event_callback (...) method. But, after that, we must stop the JOIN process of the LMIC itself. Otherwise, we will exit the ttn_join () method, and the LMIC will still try to connect. This is one of the solutions. I'm ready to test it

ttn_event_t ttn_event = TTN_EVENT_NONE;

if (waiting_reason == TTN_WAITING_FOR_JOIN)
{
    if (event == EV_JOINED)
    {
        ttn_event = TTN_EVNT_JOIN_COMPLETED;
    }
    else if (event == EV_REJOIN_FAILED || event == EV_RESET || event == EV_JOIN_TXCOMPLETE)
    {
        ttn_event = TTN_EVENT_JOIN_FAILED;
    }
}
manuelbl commented 2 years ago

In fact, the LMIC follows the https://www.thethingsnetwork.org/docs/devices/bestpractices/ specification for best practices. The device should use JOIN very rarely.

That sounds like a misunderstanding. Best practices recommend to avoid rejoins by retaining the assigned DevAddr. But this case is about the initial join and in particular about the case where the join doesn't succeed. Failed joins don't count. This case is not covered in the best practices.

And best practices basically boil down to either not power off your device or to retain the session settings including DevAddr. The former one is out of LMIC's control, and the latter one is not implemented. I had to go to some length to make work anyway.

Your proposal of changing ttn_join() is basically to add an option to abort the join if the first try fails. There are many reasons why a join can fail: too high data rate, RF TX collision, radio disturbance etc. It's not reliable to detect if there is a gateway nearby. Thus I think aborting after just a single try will not be useful to many people.

The options I'm considering are:

I will think about it.

DylanGWork commented 2 years ago

Hi guys, great conversation.

I have implemented an abort process (I even change an LED to red to indicate this) to the join process after 5 failed join processes, it's a messy implementation though.

Would be great to see this as a feature.

This may be a silly question that I can just look up, but while I'm here: Can we have the default join DR be the lowest DR, or an easy way to set it as that?

cdrx commented 2 years ago

How could we improve the library? Should we add a timeout parameter to the ttn_join() functions? If so, a realistic timeout is 10 minutes or more. Or should the function be changed to be asynchronous? It would make it easier to handle the error case but more difficult to handle the regular case.

An async version of ttn_join() would be really useful. Something like this:

ttn_join_async();
uint8_t timer = 0;

while (ttn_is_joined() == false) {
   timer++;

   if (timer > 120) {
       ttn_join_abort();
   }

    vTaskDelay(1 second);
}

ESP_LOGI(TAG, "joined!");

Would be ideal.

For my use case; the TTN provisioning is done by writing keys to the ESP over bluetooth, from a mobile app. If the user writes incorrect keys, then ttn_join() is ultimately called but never returns (because the join will never succeed). If the user updates the provisioning keys, over bluetooth connection, I can't find a practical way to cancel an active ttn_join() and try again with new keys.

Nightroamer commented 1 year ago

Hi All,

I have implemented the Hello World test code and also get an infinite join loop. Occasionally i will see an Accept Join request on TTN but never any payload data. Serial monitor shows: [0;32mI (33376) ttn: event EV_TXSTART [0;32mI (38716) ttn: event EV_RXSTART [0;32mI (39716) ttn: event EV_RXSTART [0;32mI (39826) ttn: event EV_JOIN_TXCOMPLETE [0;32mI (40736) ttn: event EV_TXSTART

I am using AS923 on my Gateway, and node (TTN setup) I am using AS923 in the code also via setting menu.

Has anyone been able to get around this?

Nightroamer commented 1 year ago

Hi All,

I have implemented the Hello World test code and also get an infinite join loop. Occasionally i will see an Accept Join request on TTN but never any payload data. Serial monitor shows: [0;32mI (33376) ttn: event EV_TXSTART�[0m [0;32mI (38716) ttn: event EV_RXSTART�[0m [0;32mI (39716) ttn: event EV_RXSTART�[0m [0;32mI (39826) ttn: event EV_JOIN_TXCOMPLETE�[0m [0;32mI (40736) ttn: event EV_TXSTART�[0m

I am using AS923 on my Gateway, and node (TTN setup) I am using AS923 in the code also via setting menu.

Has anyone been able to get around this?

So i also managed to fix the issue by inserting the below into the thethingsnetwork.cpp

bool TheThingsNetwork::joinCore() { if (!provisioning.haveKeys()) { ESP_LOGW(TAG, "Device EUI, App EUI and/or App key have not been provided"); return false; }

manuelbl commented 1 year ago

So the problem has been solved?

BTW: If the file TheThingsNetwork.cpp contains the method joinCore(), you are using an old version of the library. This method was removed more than a year ago.

Nightroamer commented 1 year ago

So the problem has been solved?

BTW: If the file TheThingsNetwork.cpp contains the method joinCore(), you are using an old version of the library. This method was removed more than a year ago.

Yes it is solved but only if I add the above code to thethingsnetwork.cpp

I have downloaded the source code from here so is there a way I could somehow have the old library? In my ignorance (new to this) I thought the library was supplied within.

manuelbl commented 1 year ago

You have probably downloaded the code from the Releases. I have indeed not updated this for some time. Now it's up-to-date again.

You can either download it from the release page or with green "Code" button on the home page.

Nightroamer commented 1 year ago

Excellent I will try this later today.

I had blindly followed the download in the Getting Started guide (Platformio also the same, I use this)

https://github.com/manuelbl/ttn-esp32/archive/master.zip

jpalumbo1981 commented 6 months ago

Hi, I encountered the same issue of an infinite loop when testing the 'Hello World' example on a Heltec Wireless Bridge with an ESP32 and SX1276 transceiver. I've tried all the suggestions written in this forum, but without success. I receive random join requests, but they are not successful. I noticed that the RSSI is -110, but when I compile the code in Arduino with the Heltec library, the RSSI is -40. Thanks for your assistance.