mcci-catena / arduino-lmic

LoraWAN-MAC-in-C library, adapted to run under the Arduino environment
https://forum.mcci.io/c/device-software/arduino-lmic/
MIT License
629 stars 207 forks source link

Sleeping with interrupts disabled? #849

Open Miceuz opened 2 years ago

Miceuz commented 2 years ago

I am using LMIC on nRF52 platform. I have successfully ported hal.cpp and now I want to switch to using interrupts. Mainly I want to sleep while waiting for RX as that's the majority of time the library spends busy-waiting. I have started by looking around the code base and noticed this place:

https://github.com/mcci-catena/arduino-lmic/blob/8d378ea410887d3fb08ea2c9acce95cc4c047788/src/lmic/oslmic.c#L153

System is being put to sleep with interrupts disabled. If so, how will it wake-up from the sleep? Isn't this a bug?

terrillmoore commented 2 years ago

Possibly; except that hal_sleep() is an artifact of the IBM code; it does nothing on Arduinos. The sleep has to be done outside the LMIC.

The way to do this is (outside the LMIC, with v4.1.1):

  1. determine that you'd like to try to sleep, and for how long (in ostime_t ticks).
  2. call os_queryTimeCriticalJobs(howLong). It's safe to sleep for howLong ticks, if the routine returns 0. If it returns non-zero, you must not sleep that long.

A little unwieldy but it works. (It's on the list to have a more convenient API, but any API that returns the time to the next scheduled event also has to return an extra bit (no scheduled event), and then I worried about people not checking both pieces of information. The real fix is to rework the lower part of the code entirely to allow for things like simple RTOS integration. See my other lengthy post about FreeRTOS integration in the last few days. I don't recommend trying to implement any kind of sleep in hal_sleep(). If you read the IBM docs, it seems to suggest that it should set a flag that will cause a sleep when you get out of the LMIC.

Remember that if you use interrupts you cannot call the LMIC from the ISR. You must only set a flag, record the time of the interrupt, and then schedule the LMIC to be called as a normal task. hal_processPendingIRQs() then has to pick up the information recorded by the ISR, and dispatch. Nothing else will work.

The assumption in the original code, I believe, was that hal_sleep() would adjust interrupts before sleeping. But I'd have to check the history. At one time the LMIC tried to actually process interrupts while the LMIC was running, and that caused all kinds of problems, not least that the linked list could be corrupted. It may be that the interrupt wrangling was added to fix that. That was before my time.

Miceuz commented 2 years ago

I ended doing it like this:

in oslmic.c I have added a method:

ostime_t os_getNextJobDeadline() {
  if(OS.scheduledjobs) {
    return OS.scheduledjobs->deadline;
  } else {
    return -1;
  }
}

in hal.cpp I have added hal_sleep() method:

extern bool lora_can_sleep;

void hal_sleep () {
    lora_can_sleep = true;
}

Also I don't enable/disable interrupts in hal as I am running on nrf52840 and can't mess with interrupts as BLE code would break.

Then in event handler I have added on_tx_complete handler on TX_COMPLETE event:

void on_tx_complete(uint8_t dataLen, uint8_t dataBeg, uint8_t *frame) {
  lorawan_tx_in_progress = false;
}

Also I have added this method to detect when lmic stack is done with all the tasks:

bool lorawan_is_opmode() {
  return (LMIC.opmode & OP_POLL);
}

Then in my high level code responsible for sending of the message I do it like this:

lorawan_tx_in_progress = true;
lorawan_set_payload(packet.data, packet.len);

while(lorawan_tx_in_progress || lorawan_is_opmode()) {
      lora_can_sleep = false;
      os_runloop_once();

      if(lora_can_sleep) {
        next_job_time = lora_get_next_job_time();

        if(next_job_time > 0){
          int64_t ticks_to_job = next_job_time - app_timer_cnt_get64();
          if(ticks_to_job > APP_TIMER_TICKS(1)) {
            sleep_for_ticks(ticks_to_job);
          }
        } else if(next_job_time == -1) {
          //maybe we are done with transmission here and there are no jobs left?
        }
      }
    }

Also I turn power to RFM95 via external transistor, for that I save and load LMIC state:

lmic_t lmic_state;
bool lmic_state_saved = false;

void lora_save_state() {
  memcpy(&lmic_state, &LMIC, sizeof(lmic_t));
  lmic_state_saved = true;
}

void lora_load_state() {
  if(lmic_state_saved) {
    memcpy(&LMIC, &lmic_state, sizeof(lmic_t));
  }
}

So far it works very reliably with no surprises. I understand that interrupts might mess LMIC timing, but all the standard ble stack interrupts are really short and I don't do any long processing in my interrupts. Also high CPU speed helps.

dajtxx commented 2 years ago

I'm trying to figure out how to do this on the AdaFruit Feather M0, so thanks for the notes above. I also noted hal_sleep() being called while interrupts are disabled and wondered how that would work!

What I'm also wondering about is how to allow LMIC to 'do its own thing' between my uplinks. See the thread https://www.thethingsnetwork.org/forum/t/mac-commands-being-sent-each-time-after-join-process/55569/14, where I see LMIC responding to config commands from TTN between the uplinks initiated by me. As far as I can see these will only happen if I busy loop LMIC's os_runloop_once().

Would uplinks like that get included in job queue queries?

What would happen if TTN sends config downlinks and my node just ignores them?

Our solution so far has been to completely ignore LMICs job scheduling and (after joining) only busy loop os_runloop_once() between initiating an uplink and getting the TX_COMPLETE event, then going to sleep and using an alarm to wake the M0 next time we want to measure/uplink. It feels like a hack, but seems to work to some extent. It means LMIC never gets to respond to requests from TTN after the join which seems bad.

d-a-v commented 2 years ago

I'm trying to figure out how to do this on the AdaFruit Feather M0, so thanks for the notes above. I also noted hal_sleep() being called while interrupts are disabled and wondered how that would work!

I am using #756 for targets having their clock stopped during sleep.

terrillmoore commented 2 years ago

@dajtxx you need to keep calling the runloop until os_queryTimeCriticalJobs(ostime_t desiredSleepTime) returns 0. (If it returns non-zero, you can't sleep for desiredSleepTime ticks. This requires that you not use the osjob_t queue for your own delays, but it's workable.

The HAL logic that seems to disable interrupts does not really disable them. The HAL code disables LMIC interrupts, not system interrupts. LMIC interrupts are synchronized at the edge of the job loop; disabling interrupts only prevents the ISR from begin invoked during the next job loop call. (Note that it has nothing to do, if interrupts are enabled, with preventing SX1276 interrupts. These always happen as fast as possible, and the current time is recorded. Later, when the runloop decides to allow interrupts, the time is delivered to the "ISR".) The naming is unfortunate, but historical.