mcci-catena / arduino-lmic

LoraWAN-MAC-in-C library, adapted to run under the Arduino environment
https://forum.mcci.io/c/device-software/arduino-lmic/
MIT License
650 stars 212 forks source link

Saving / Loading state #587

Open runezor opened 4 years ago

runezor commented 4 years ago

I'm currently using this library on ESP devices, and when using deep sleep on these devices, the memory is cleared. It's often a hassle having to restore the entire state of this library, and I find myself writing structs in my code that just mirror the LMIC object.

Instead I think it'd be nice if the library supported this natively.

I'm currently working on mirrorring the state in an arduino sketch, but if the author feels like this is a worthwhile addition to the library, I wouldn't mind making a pull request. I can think of two approaches:

terrillmoore commented 4 years ago

Thank you for opening this discussion.

I would have done this long ago, but the LMIC implementation makes all of the contents of the LMIC structure part of the API. So we can't do this, alas, without a careful procedure (and some number of months while people prepare and adapt).

The first part of the long-term procedure is to write getters for all API LMIC elements, and setters where necessary. These can be static.

The second part is to rename everything (so that this won't have to happen again), and hide the LMIC structure contents from the clients. That step could do the reorganization.,

The modest, not so disruptive step, along the lines of what you are discussing is to add state-save and state-restore, and design these to externalize the data (at least far enough that different versions of the LMIC can deal with a common form of saved state).

Despite the need for code space efficiency, let's avoid adding structs that external clients can read; let's keep the saved state opaque to clients. (Remember that people on at32u4s are not using this now, and are surviving; putting all the mechanism in separate functions will ensure they are not penalized.) If clients need to know what's in a saved state blob, we can provide additional methods.

runezor commented 4 years ago

Thank you for your response.

While I do agree that following the procedure you describe would result in a much nicer code base, is there anything preventing someone from writing a simple save/restore method without hiding the struct from clients? A bit like the savestate struct in this sketch https://github.com/Edzelf/LoRa/blob/master/ESP_lora_tracker/ESP8266_loratracker.ino#L64, but internal in the library rather than external.

terrillmoore commented 3 years ago

Rather than add the code in C, I've done this in C++ in the Arduino LoRaWAN library. Perhaps that would be enough?

lnlp commented 3 years ago

@terrillmoore

@terrillmoore terrillmoore added this to To do in v4.0 on Feb 22

It's currently August 14, 2021 and arduino-lmic v4.0 was released some while ago but currently still no support for saving/restoring LMIC state.

Rather than add the code in C, I've done this in C++ in the Arduino LoRaWAN library. Perhaps that would be enough?

No, not at all.

We desparately need support for saving/storing LMIC state in the LMIC library. See: https://forum.mcci.io/t/lmic-development-plans/19/6

Rather than add the code in C, I've done this in C++

Whether implemented in C or C++ is less of a concern, more important is to have the functionality available in the LMIC library. Actually I prefer C++ because a higher level language with its advantages.

LMIC users should be able to use standard functions (e.g. saveLmicState(...) and restoreLmicState(...)) that are part of the LMIC library and then implement their own code to physically store and restore the provided state data to/from some kind of non-volatile memory. Actual implementation will depend on MCU architecture and additional hardware (e.g. availability of EEPROM, flash or FRAM) and therefore needs to be implemented by the user, but internal support to enable this needs to be provided by the LMIC library itself.

I've done this in C++ in the Arduino LoRaWAN library. Perhaps that would be enough?

No, support has to be provided (and documented) in the LMIC library itself.

Users should only have to implement the (standard, generic) saveLmicState() and restoreLmicState() functions. Users should not have to be bothered with having to dive into and trying to understand the arduino-lorawan library first and extract/adapt the parts responsible for saving and restoring LMIC state (or be required to use arduino-lorawan only because they need to save/restore LMIC state).

My assumption is that for the author of both MCCI LMIC (arduino-lmic) and arduino-lorawan it should be relatively straightforward to copy and adapt functionality for saving/restoring LMIC state from arduino-lorawan to arduino-lmic. For an outsider/the average user this will be much more difficult and error prone. It's much better and preferred to have this implemented in the LMIC library in a standard way (where this only needs to be done once and benefits all users).

Getting this essential functionality available in LMIC shall not have to wait for / not depend on other plans to 'improve LMIC in general' and/or 'integration of arduino-lmic and arduino-lorawan').

I am frequently asked for advice about which Arduino LoRaWAN library to use because users need to save and restore LMIC state but MCCI LMIC does currently not support it. We would all welcome to have this support available in the MCCI LMIC library soon as possible.

lnlp commented 3 years ago

@terrillmoore Can you give an update on this? Thanks

terrillmoore commented 3 years ago

We have been getting field experience with Arduino-LoRaWAN library. We're still having some issues, which prevent moving this to the LMIC.

  1. the nonce is not stored. Although we are not 1.0.4 compliant, it seems to me that if we are doing a storage format, we should store the nonce.
  2. there is still an unexplained problem when converting from v2 to v3 with saved state; it often happens that the device needs to be rebooted, once, or the v3 join won't receive downlinks (the network sees the uplinks). This suggests that there's something not fully understood about restoring state.

We expect to go through the conversion process with a large number of devices over the next few weeks, now that we have the bulk of the TTN NY and TTN Ithaca gateways moved to V3.

Sorry for the delay. In addition the delta thing has upended plans and projections again, so I've had to do corporate things.

Based on user problems, we have additional high priorities for the LMIC:

  1. adding a "getting started" sketch to help people bring up their radios. This should include diagnostics to test the radio and report problems.
  2. reverting the clock-error limiting thing (that looks like a mistake, because lots of devices actually need the wider range),
  3. adding joinrequest rate limiting
  4. adding proper logging, so that errors can be diagnosed without changing the speed of the LMIC
  5. rerunning compliance tests with the latest tests from RedwoodComm.

MCCI really needs me to allow for run-time region and network selection, instead of compile time. The power meter products really need this very much. MCCI also realy needs me to finish our FUOTA implementation. (There's no need for multicast in lightly loaded networks and sparse device distribution - and even slow FUOTA is better than rolling the truck. I already have done the secure bootloader for STM30L0 and ed25519 code signing using TweetNaCl and it works very well. Adding this support to the LMIC would be really nice, and probalby can be made to work even on AVR32 if you are willing to have an external SPI flash. Code is published at https://github.com/mcci-catena/bootloader, and we're using it -- with firmware over other media, not over LoRa -- with our power meters.)

Best regards, --Terrry

lnlp commented 3 years ago
  1. the nonce is not stored. Although we are not 1.0.4 compliant, it seems to me that if we are doing a storage format, we should store the nonce.

Yes indeed it should. How is the nonce currently implemented in LMIC, as an incrementing counter?

2. there is still an unexplained problem when converting from v2 to v3 with saved state; it often happens that the device needs to be rebooted, once, or the v3 join won't receive downlinks (the network sees the uplinks). This suggests that there's something not fully understood about restoring state.

Do you mean the device saved state when connected to V2 and causes issues when it restores state when connected to (configured on) V3? Could this be a general V3 issue instead of being related to storing and restoring LMIC state?

Was storing/restoring LMIC state already tested with a device saving state while connected to V3 and then restoring the state to continue on V3?

cyberman54 commented 3 years ago

I can answer the last question. I build a simple store/restore mechanism on application level by saving the whole lmic struct, before device goes to sleep, and restore the struct after the device wakes up again. This is working with v3, so far i don't see any issues. But did not do deep testing.

cnmicha commented 3 years ago

I did the same as @cyberman54, works for me as well. I did some testing with class A and LoRaWAN 1.0.3 only. You need to make sure both task scheduler stacks are empty before sleeping, if your microcontroller is clearing the RAM content after wakeup. Alternatively, you may use another sleep mode that does not clear the RAM content.

terrillmoore commented 3 years ago

How is nonce implemented?

Incrementing counter, seeding with a 2-byte random number in LMIC_reset(). It would be better to save and restore.

unexplained problem

It works for MCCI either at V2 or V3; we see a problem in transitioning from V2 to V3. But we are making a lot of other changes which is why I say "we need to understand it". I tend to think it's not an LMIC problem per se, but an error in how we're restoring state. The precise repro case is:

  1. join at v2
  2. save/restore (possibly multiple times, and send some data).
  3. force appeui to 1
  4. force rejoin
  5. uplinks are correct but downlinks are not received/processed/accepted (not enough info to know which of these is the case; uplinks are shown to be correct in the console and on the LoRaWAN analyzer).
  6. system reset (which saves / restores, but also does LMIC_reset()) works properly.