sandeepmistry / arduino-LoRa

An Arduino library for sending and receiving data using LoRa radios.
MIT License
1.61k stars 620 forks source link

singleTransfer(...) aborts sporatically #224

Open brettbeeson opened 5 years ago

brettbeeson commented 5 years ago

Hello, thanks for a very useful library. I'm using a ESP32 (TTGO 2.1v1.6 LoRa) to receive packets at 10s intervals. At random intervals (1min to 10 hours), I get an abort due to a failed exception. The trace is:

4008d138: invoke_abort at /Users/ficeto/Desktop/ESP32/ESP32/esp-idf-public/components/esp32/panic.c line 155 0x4008d369: abort at /Users/ficeto/Desktop/ESP32/ESP32/esp-idf-public/components/esp32/panic.c line 170 0x40090003: xQueueGenericReceive at /Users/ficeto/Desktop/ESP32/ESP32/esp-idf-public/components/freertos/queue.c line 1445 0x400dd50d: spiTransaction at /home/bbeeson/.arduino15/packages/esp32/hardware/esp32/1.0.1/cores/esp32/esp32-hal-spi.c line 722 0x400d79c1: SPIClass::beginTransaction(SPISettings) at /home/bbeeson/.arduino15/packages/esp32/hardware/esp32/1.0.1/libraries/SPI/src/SPI.cpp line 135 0x400d42c0: LoRaClass::singleTransfer(unsigned char, unsigned char) at /tmp/arduino_build_610178/sketch/LoRa.cpp line 638 0x400d42f4: LoRaClass::readRegister(unsigned char) at /tmp/arduino_build_610178/sketch/LoRa.cpp line 624 0x400d4633: LoRaClass::handleDio0Rise() at /tmp/arduino_build_610178/sketch/LoRa.cpp line 598 0x400d4692: LoRaClass::onDio0Rise() at /tmp/arduino_build_610178/sketch/LoRa.cpp line 650](url)

This line (in queue.c) is:

configASSERT( !( ( xTaskGetSchedulerState() == taskSCHEDULER_SUSPENDED ) && ( xTicksToWait != 0 ) ) );

Unfortunately it is a rare fault and I can't reliably reproduce it. I've tried wrapping singleTransfer in a critical section. I've checked the interrupt and main thread are both on CPU 1.

Do you have any ideas on what the problem might be, and/or how to further debug it?

morganrallen commented 5 years ago

I'm curious what source you're using?

0.5.0 from the Arduino Library is only ~618 lines long but you're getting errors on LoRa.cpp line 638... which isn't in the function 'LoRaClass::singleTransfer' in current development head.

brettbeeson commented 5 years ago

Good catch! I am indeed using 0.5.0 (via Arduino Library Manager). After the errors, I added some debugging and criticalSection calls. These didn't change the faults.

morganrallen commented 5 years ago

OK, can you do one of two things.

1) Tell me which line numbers those correlate to in the actual 0.5.0 release (just for LoRa.cpp) 2) Run this test with an unmodified copy of the LoRa library to get the correct line numbers

brettbeeson commented 5 years ago

I'm still getting these problems, so did #2 (i.e. grabbed the latest LoRa.cpp and LoRa.h from github). The exception report is now:


Decoding stack results
0x4008d138: invoke_abort at /Users/ficeto/Desktop/ESP32/ESP32/esp-idf-public/components/esp32/panic.c line 155
0x4008d369: abort at /Users/ficeto/Desktop/ESP32/ESP32/esp-idf-public/components/esp32/panic.c line 170
0x40090003: xQueueGenericReceive at /Users/ficeto/Desktop/ESP32/ESP32/esp-idf-public/components/freertos/queue.c line 1445
0x400e52d1: spiTransaction at /home/bbeeson/.arduino15/packages/esp32/hardware/esp32/1.0.1/cores/esp32/esp32-hal-spi.c line 722
0x400d8141: SPIClass::beginTransaction(SPISettings) at /home/bbeeson/.arduino15/packages/esp32/hardware/esp32/1.0.1/libraries/SPI/src/SPI.cpp line 135
0x400d48ac: LoRaClass::singleTransfer(unsigned char, unsigned char) at /home/bbeeson/Dropbox/IT/Arduino/libraries/Blobs/LoRa.cpp line 671
0x400d48e0: LoRaClass::readRegister(unsigned char) at /home/bbeeson/Dropbox/IT/Arduino/libraries/Blobs/LoRa.cpp line 657
0x400d4c89: LoRaClass::handleDio0Rise() at /home/bbeeson/Dropbox/IT/Arduino/libraries/Blobs/LoRa.cpp line 631
0x400d4ce2: LoRaClass::onDio0Rise() at /home/bbeeson/Dropbox/IT/Arduino/libraries/Blobs/LoRa.cpp line 683
0x4008132d: __onPinInterrupt at /home/bbeeson/.arduino15/packages/esp32/hardware/esp32/1.0.1/cores/esp32/esp32-hal-gpio.c line 220

One possible cause is the call of xQueueGenericReceive from a ISR (I assume LoRaClass::onDio0Rise() is an ISR?), but I'm not sure if this is the case, or how to test/correct it.

brettbeeson commented 5 years ago

Further to this, I've hacked up a alternative which moves the processing outside the ISR.

The new ISR is ....

void LoraReaderPacketAvailableISR() {
  static const int flag = 1;
  xQueueOverwriteFromISR(LoraReaderPollSingleton->_packetsAvailable, &flag, &xHigherPriorityTaskWoken
// error checking, etc omitted

.... while a FreeRTOS Task handles the processing:

void ProcessorTask(void*) {
   while ( xQueueReceive(l->_packetsAvailable, &flag, portMAX_DELAY) == pdPASS) {
      // processing as for Lora.cpp's handleDioRise()
   }
}

This works without crashing, as least for the last 24 hours. I'm not suggesting this is the right way to do things, as I don't know nearly enough about LoRa or IRQs. However, it may assist in understanding the original problem.

rfx77 commented 3 years ago

Same Problem here. I created an issue in esp-idf. Seems like a FreeRTOS Problem to me: https://github.com/espressif/esp-idf/issues/6492

IoTThinks commented 3 years ago

@rfx77 Is the below confirmed? Normally, I create a FreeRTOS task to read LoRa message inside ISR. Hence, my LoRa reading is normally outside ISR.

"Probably you are doing some reading from LoRa in ISR function. I am recommending to use this library:"

rfx77 commented 3 years ago

The Lora driver uses the SPI bus in the ISR before my custom code gets handled in the ISR. Thats the problem. This is confirmed by the esp-idf team. the solution from brettbeeson should work but i have not tested it yet. maybe someone could change the ISR of the LoRa driver to do the same think and avoid the problem.

From my unterstanding of the LoRa documentation someone should use the ISR approch so it maybe worth to fix it.

IoTThinks commented 3 years ago

@rfx77 Do you have any simple code to replicate the issue?

rfx77 commented 3 years ago

No sorry. From what i understand from the esp-idf team the handleDio0Rise in https://github.com/sandeepmistry/arduino-LoRa/blob/c1bb134755b538f0e195d5d6784b949cd3382167/src/LoRa.cpp#L692 must not be run in the ISR because it uses the SPI.

the problem could be reproduced better when you have a LoRa sender which sends in a very short interval and a display loop on the receiver which displays without delay. but there is no standalone code which i can give you. to test on one device.

IoTThinks commented 3 years ago

@rfx77 If I understand correctly, read/writeRegister (which trigger SPITransaction) inside ISR is not supported from expressif. https://github.com/espressif/esp-idf/issues/6492#issuecomment-771889745

image

Then even spin off a FreeRTOS on onReceiveCallback won't help.

IoTThinks commented 3 years ago

@brettbeeson @rfx77 Will it help if we move everything inside handleDio0Rise to a FreeRTOS task? handleDio0Rise will only create the FreeRTOS task

void LoRaClass::handleDio0Rise()
{
// Create a FreeRTOS task A
}

FreeRTOStaskA
{
// Put the code of existing handleDio0Rise here.
}
rfx77 commented 3 years ago

Yes. i think this is the solution. Here is the comment from esp-idf: https://github.com/espressif/esp-idf/issues/6492#issuecomment-771889745

rfx77 commented 3 years ago

i would create the task in the onReceive function if it does not already run so that you dont create a task on every interrupt. if NULL is passed to onReceive you could kill it .

IoTThinks commented 3 years ago

@rfx77 The problem is in line 697. It happens before onReceive() on line 712. Hence, the FreeRTOS in onReceive() is not called yet.

Unless you move everything inside handleDio0Rise to a task.

In my current code, I create a task to read and process LoRa message inside onReceive(). No issue yet.

IoTThinks commented 3 years ago

Arduino seems to support SPI transaction inside ISR? Expressif's implementation seems to be weak at this?

If your program will perform SPI transactions within an interrupt, call this function to register the interrupt number or name with the SPI library. This allows SPI.beginTransaction() to prevent usage conflicts. Note that the interrupt specified in the call to usingInterrupt() will be disabled on a call to beginTransaction() and re-enabled in endTransaction().

https://www.arduino.cc/en/Reference/SPIusingInterrupt

rfx77 commented 3 years ago

I also create a Task now to read from Lora but i have to delay in this task so it doesnot consume all cpu. this ist nor a very nice implementation in my opinion. working with interrupts would be better.