jpmeijers / RN2483-Arduino-Library

Arduino C++ code to communicate with a Microchip RN2483 module
Apache License 2.0
84 stars 60 forks source link

Prevent unneeded join requests & added some get/set functions #66

Closed TD-er closed 2 years ago

TD-er commented 4 years ago

If the module remains powered, the joined status values remain in tact. This allows to continue using the same session and just continue. This is possible using either OTAA or ABP. So the default assumption is now that there is a "joined" state, and only perform a true join when the result of a TX attempt returns "not_joined".

This does save a number of unneeded join requests and thus the number of downlink packets.

In order to keep using an ABP session, one must call setLastUsedJoinMode before calling an init function. Right after calling the constructor. (so it should be part of the constructor?)

For ABP there is also a function added to set the frame counters, in order to continue the session.

The internal status is now also monitored, but it is not yet clear if that's useful right now. The "joined" flag for instance is never set to '1'.

Added

TD-er commented 4 years ago

Maybe for these changes, the examples should also be updated.

Alex9779 commented 3 years ago

I think this PR is on the right track regarding what I wrote in #80 but still the overall joining process is still not best practice IMHO. I have to get more in the details but what I consider best practice is that a device should join a network only once upon provisioning. As I already outline the RN2483 has some problems providing a smooth way to do that because triggering an OTAA join will always regenerate a new device address in the network and send one uplink transmission and has to receive one downlink transmission. So this has to be done only once and not every reboot of the device. But for this to work I found that you have to initialize the ABP key stores so that the keys retrieved when joining by OTAA are stored properly and get saved by "mac save". If that is done correctly a reboot of the module would not need to join by OTAA again but just to call an ABP join to load the keys. As it is not possible to get the "key" values from the module (because of obvious reasons) like "appkey", "appskey" and "nwkskey" I think the best approach could be to just have one "join" method or modify the OTAA join in a way that it tries an ABP join every time it gets called, if the needed keys are there, that works with "accepted", if not we get a "keys_not_init" which then means we have not joined previously and do the OTAA to get them. I will review this PR and as soon as I get to it maybe provide an update with my proposed changes (if not anyone else is faster :P)...

TD-er commented 3 years ago

Just keep in mind you also need to keep track of the message counter(s) and restore them. It usually is best to simply try to call ABP join and check the result. (also check the result when trying to send) For example, if the RN2483 was not power cycled, but the microcontroller did a reboot, you may still have a join and you need to get the uC in sync with the RN module. That's mainly the idea behind this PR.

But the other way around is also possible, where the RN module was reset, or somehow got stuck in an eternal "busy" state (replying "busy" to everything you send to it). Then it may need a new join and if that failed you should set the ABP data again and preferrably the counters. (otherwise you have to disable the counter check in the TTN console)

Alex9779 commented 3 years ago

Ok just about the same as mentioned in #29 for the counters. The problem to solve here is that normally the system is powered two "errors" can occur:

  1. the MCU gets stuck
  2. the RN2483 gets stuck

If 2. happens we can handle that in our code, we have to reset, and if we tracked the counters we can set them re-join by ABP and cotinue so no need to "mac save", maybe just once a day, no idea how robust the EEPROM is, didn't find anything just some post with the same concerns doing a save after every tx. If 1. happens we have a problem, either we have a watchdog or we have to reset the device manually but we reset which normally also resets the connection to the module but we have no idea about the values if we didn't store them in a persistent memory of the MCU which has the same drawbacks as doing a "mac save"

I consider a power cycle the same as 1.

An ideas?

TD-er commented 3 years ago

Having "no idea about the values" is not true. You can read the counter values and joined state. Also performing a tx does give a reply with some information.

So on start of the MCU, you must assume the RN module is still joined (as is done in this PR) and try to continue sending data. If that fails (which can fail if the RN is not joined or stuck), just base your action on the reply of the RN module.

Not joined (or other related error replies):

If the RN module replies indicate some "stuck" state, like only replying "busy", perform a reset and restore keys + counters (maybe you can still read the last counter value in this 'stuck state')

If the RN module replies to be joined and a tx is successful, then just consider the MCU to be crashed, not the RN.

If the RN module replies to be joined but its counter value is 'suspicious' (e.g. exact multiple of N, see other issue we discussed this), then add (N-1) to the counter and try to send.

N.B. when implementing the "hop by N" strategy, make sure to use 32 bit counters.

Alex9779 commented 3 years ago
  • Write ABP credentials

Hmmmm how can I if I initially joined with OTAA? I don't know these values... They should be stored in the RN... That's what I tried to explain. As far as I can see you check if you are joined but if you previously joined by OTAA you try again joining by OTAA which is not needed if you really stored the session keys and this didn't work for me unless I did a "mac set ... 0000000000000..." with all zeros to appskey and nwkskey.

TD-er commented 3 years ago

You can't... The only way is to fetch the resulting ABP credentials from the TTN console, but that's somewhat defeating the purpose.

It is either OTAA, or ABP.

If you are sure you have performed a successful OTAA before you can ignore the "write ABP credentials" and "join" parts. Just focus on the right counters.

I am not entirely sure what happens if your module receives network related replies to adapt its settings. If it then does also call a mac save, then the theory we had about checking the counter to be a multiple of N will not work anymore until the counter reaches a new multiple of N and we call a mac save ourselves.

But still, it is possible to get into some limbo state, of which the only way out is a new OTAA join. So if it is a module "in the field" I would advice to always use ABP as you don't have any guarantee out there you may receive the OTAA reply from the gateway. Also there is the problem where a new join may fail if you performed them regularly as you may run into some collision with an already used nonce. (TTN shows "Activation DevNonce not valid: already used") The chance of generating a random value twice increases on every OTAA join request.

TD-er commented 3 years ago

I also found this on the TTN forum: OTAA best practices: how not to join every time?

Alex9779 commented 3 years ago

Think we have to separate the terms of what TTN or the network talks about and what is needed for the device. From what I found here and my tests to verify that I do the following on my test device:

  1. sys factoryRESET
  2. mac reset 868
  3. mac set DR5
  4. mac mac set appkey …
  5. mac set appeui 0000000000000000
  6. mac set devaddr 00000000
  7. mac set nwkskey 00000000000000000000000000000000
  8. mac set appskey 00000000000000000000000000000000
  9. mac join otaa
  10. mac save

Then I can power down the device or reset it with "sys reset" and when it gets back up all I have to do is "mac join ABP" to be able to transmit again.

If I omit steps 7-8 the OTAA join happens but power cycling the device and then doing an ABP join gives me "keys_not_init". Obviously the keys you get back are not properly stored on the RN.

So basically, just to emphasize this again, an ABP is a soft join, no data is transmitted, just encryption keys are load for the communication. An OTAA join instead transfers a request with AppEUI (not used with ChirpStack) and an AppKey to authenticate, then we get a DeviceAddress, an ApplicationSessionKey and a NetworkSessionKey from the network back. Those have to be stored and above procedure ensures they are.

So what I want to say is that ABP from a networks point of view is a join but the network knows nothing about it. On the RN we HAVE to join in order to be able to transmit. So we either have to "mac join otaa" or "mac join abp" and the latter needs the keys to be stored which we in case of OTAA do not know from the device but got when we did it, but normally the are not saved properly. No idea if that is a firmware bug or something meant to be. I read the command reference up and down, but except that post on TTN forums I didn't find anything else. Maybe most don't care about "best practices" and just join and join and join and mostly this just creates traffic but no other problems so they have a reason to change that.

TD-er commented 3 years ago

I know the ABP "join" is just a state change in the RN module and nothing more. And also in my PR (not sure if it is the open one or one already merged) I also set the not used "ABP" values to all zero when I perform an OTAA join + mac save.

I'm not entirely sure I do call mac join abp to continue. That's a good one to check. So the usual flow should be:

  1. Assume it is joined, so try to tx
  2. If tx fails, try mac join abp (+ restore counters ! )
  3. if ABP join fails, try mac join otaa
Alex9779 commented 3 years ago

Sounds good to me, 1. I assume you mean if you reconnect with the MCU. Maybe there are better ways than just try, we could check "mac get status" when connecting to the module.

As already said I will work through this for my current project and share my findings and results, I want to adapt this lib for an Atmel AT328PB as an MCU, I will transfer the logic to this original lib then too. Just want to get this right not only in logic but also from a best practice point of view to keep the impact on the network and rf band as low as possible...

TD-er commented 3 years ago

Just want to get this right not only in logic but also from a best practice point of view to keep the impact on the network and rf band as low as possible...

That's always a good idea to keep in mind :)

Alex9779 commented 3 years ago

And also in my PR (not sure if it is the open one or one already merged) I also set the not used "ABP" values to all zero when I perform an OTAA join + mac save.

Yeah you do but are you sure this is working with the module? You set both to string "0", but according to the command ref it has to be an "8-byte hexadecimal number" so as a string "00000000000000000000000000000000" as I wrote earlier in my init steps. You call the command to set but you don't listen to the responses, not needed in that case, I agree, but did you actually try sending just one zero and monitor the result? Didn't check that myself yet but remembering the other command they are pretty sensitive to their parameters...

TD-er commented 3 years ago

8 byte hexadecimal sounds like "00000000"

This is part of my current code (should be part of my PR for async communication, not sure if it was already part of older PRs)

bool rn2xx3_handler::check_set_keys()
{
  // Strings are in HEX, so 1 character per 4 bits.
  // Identifiers:
  // - DevEUI - 64 bit end-device identifier, EUI-64 (unique)
  // - DevAddr - 32 bit device address (non-unique)
  // - AppEUI - 64 bit application identifier, EUI-64 (unique)
  //
  // Security keys: NwkSKey, AppSKey and AppKey.
  // All keys have a length of 128 bits.

  bool otaa_set =
    rn2xx3_helper::isHexStr_of_length(_deveui,  16) &&
    rn2xx3_helper::isHexStr_of_length(_appeui,  16) &&
    rn2xx3_helper::isHexStr_of_length(_appkey,  32);

  bool abp_set =
    rn2xx3_helper::isHexStr_of_length(_nwkskey, 32) &&
    rn2xx3_helper::isHexStr_of_length(_appskey, 32) &&
    rn2xx3_helper::isHexStr_of_length(_devaddr, 8);

  if (!otaa_set && !abp_set) {
    return false;
  }

  if (_otaa && otaa_set) {
    if (!abp_set) {
      if (!rn2xx3_helper::isHexStr_of_length(_nwkskey, 32)) {
        _nwkskey = F("00000000000000000000000000000000");
      }

      if (!rn2xx3_helper::isHexStr_of_length(_appskey, 32)) {
        _appskey = F("00000000000000000000000000000000");
      }

      if (!rn2xx3_helper::isHexStr_of_length(_devaddr, 8))
      {
        // The default address to use on TTN if no address is defined.
        // This one falls in the "testing" address space.
        _devaddr = F("03FFBEEF");
      }
    }
    return true;
  }

  if (!_otaa && abp_set) {
    if (!otaa_set) {
      if (!rn2xx3_helper::isHexStr_of_length(_deveui, 16))
      {
        // if you want to use another DevEUI than the hardware one
        // use this deveui for LoRa WAN
        _deveui = F("0011223344556677");
      }

      if (!rn2xx3_helper::isHexStr_of_length(_appeui, 16)) {
        _appeui = F("0000000000000000");
      }

      if (!rn2xx3_helper::isHexStr_of_length(_appkey, 32)) {
        _appkey = F("00000000000000000000000000000000");
      }
    }
    return true;
  }
  return false;
}

And the calling part:

bool rn2xx3_handler::init()
{
  if (!check_set_keys())
  {
    // FIXME TD-er: Do we need to set the state here to idle ???
    // or maybe introduce a new "not_started" ???
    setLastError(F("Not all keys are set"));
    return false;
  }

  bool mustInit =
    get_state() == RN_state::must_perform_init ||
    !Status.Joined;

  if (!mustInit) {
    // What should be returned? The joined state or whether there has been a join performed?
    return false;
  }

  if (!resetModule()) { return false; }

  // We set both sets of keys, as some reports on older firmware suggest the save
  // may not be successful after a factory reset if not all fields are set.

  // Set OTAA keys
  sendMacSet(F("deveui"),  _deveui);
  sendMacSet(F("appeui"),  _appeui);
  sendMacSet(F("appkey"),  _appkey);

  // Set ABP keys
  sendMacSet(F("nwkskey"), _nwkskey);
  sendMacSet(F("appskey"), _appskey);
  sendMacSet(F("devaddr"), _devaddr);

  // Set max. allowed power.
  // 868 MHz EU   : 1 -> 14 dBm
  // 900 MHz US/AU: 5 -> 20 dBm
  setTXoutputPower(_moduleType == RN2xx3_datatypes::Model::RN2903 ? 5 : 1);
  setSF(_sf);

  // TTN does not yet support Adaptive Data Rate.
  // Using it is also only necessary in limited situations.
  // Therefore disable it by default.
  setAdaptiveDataRate(false);

  // Switch off automatic replies, because this library can not
  // handle more than one mac_rx per tx. See RN2483 datasheet,
  // 2.4.8.14, page 27 and the scenario on page 19.
  setAutomaticReply(false);

  // Semtech and TTN both use a non default RX2 window freq and SF.
  // Maybe we should not specify this for other networks.
  // if (_moduleType == RN2xx3_datatypes::Model::RN2483)
  // {
  //   set2ndRecvWindow(3, 869525000);
  // }
  // Disabled for now because an OTAA join seems to work fine without.

  if (_asyncMode) {
    return prepare_join(_otaa);
  }
  return wait_command_accepted() ==  RN_state::join_accepted;
}
Alex9779 commented 3 years ago

Ahhh ok, didn't check the async code PR...