thelsing / knx

knx stack (TP, IP and RF) for arduino and linux, Can be configured with ETS
GNU General Public License v3.0
268 stars 94 forks source link

Stack stuck trying to load/unload application #291

Closed embed-me closed 1 week ago

embed-me commented 3 weeks ago

Hello,

When I try to program the device the stack gets completely stuck. I use an RP2040 with an attached external flash (128Mbit) on QSPI.

Platformio.ini is based on the example provided in knx/examples/knx-demo and looks like below:

[platformio]
libdeps_dir = .pio/libs
src_dir = .
default_envs = rp2040

[env:rp2040]
framework = arduino
platform = https://github.com/maxgerhardt/platform-raspberrypi.git#60d6ae8
platform_packages = framework-arduinopico @ https://github.com/earlephilhower/arduino-pico/releases/download/3.9.3/rp2040-3.9.3.zip
board = rpipico
board_build.core = earlephilhower

upload_protocol = mbed

lib_deps =
  knx=https://github.com/thelsing/knx.git  ; using master because it seems to support rp2040 and OpenKNX uses it as well
  SPI

monitor_port = COM4
monitor_speed = 115200

build_flags =
  -DMASK_VERSION=0x07B0        ; Normal Device TP1 = Twisted Pair (cable)
  -DKNX_FLASH_SIZE=0x8000      ; default = 0x1000 = 4KiB; must be multiple of 4096
  ;-DKNX_FLASH_OFFSET=0xF4000  ; default = 0x180000 = 1.5MiB; must be multiple of 4096
  -DUSE_TP_RX_QUEUE            ; Improved performance on TP by using a queue instead of immediate frame processing
  -DPIO_FRAMEWORK_ARDUINO_ENABLE_RTTI  ; according to rp2040_arduino_platform.cpp: RTTI must be set to enabled in the board options
  -DDEBUG_TP_FRAMES
  -Wuninitialized
  -Wunused-variable
  -Wno-unknown-pragmas

main.cpp is trivial atm and contains only the bare minimum:

#include <Arduino.h>
#include <knx.h>

#define PROG_LED 6
#define PROG_BTN 7
#define BUZZER_PIN 8

void foo(GroupObject &go) {
    Serial.println("foo");
}

void setup_KNX() {
  Serial.println("Setup Prog Button and Prog LED");
  pinMode(PROG_LED, OUTPUT);
  pinMode(PROG_BTN, INPUT_PULLUP);

  digitalWrite(PROG_LED, LOW);

  knx.buttonPin(PROG_BTN);
  knx.ledPin(PROG_LED);
  knx.ledPinActiveOn(HIGH);

  // read adress table, association table, groupobject table and parameters from eeprom
  knx.readMemory();

  if (knx.individualAddress() == 0) {
    Serial.println("KNX - Individual Adress is NOT set!");
    knx.bau().deviceObject().individualAddress(1); //65535
  } else {
    Serial.println("KNX - Individual Adress is set!");
  }

  if (knx.configured()) {
    Serial.println("KNX - Is Configured");

    // Attach a callback function to the group object
    knx.getGroupObject(0).callback(foo);
    knx.getGroupObject(0).dataPointType(DPT_Trigger);

  } else {
    Serial.println("KNX - Is NOT Configured");
  }

  knx.start();
}

void waitConsoleSession() 
{
    while (!Serial) {;}
}

void setup() {
  Serial.begin(115200);
  ArduinoPlatform::SerialDebug = &Serial;
  waitConsoleSession();

  setup_KNX();
}

void loop() {
  static uint64_t i = 0;
  knx.loop();

  if (!knx.configured()) {
    return;
  }

  i++;
  if ((i % 200000) == 0)
  {
      print(".");   // indicator if app is still alive
  }
}

Output on serial line:

readMemory
RESTORED 00 01 00 FA 00 00 00 00 00 00 00 03 11 11 00 01 04 00 01 60 00 01 00 00 00 00 00 00 00 FA 00 00 0A 01 00 00 00 02 00 00 00 54 00 01 00 FA 00 00 0A 00 04 01 00 00 00 02 00 00 00 58 00 04 01 00 00 00 04 00 00 00 5C 00 04 01 00 00 00 06 00 00 00 60 00 08
restoring data from flash...
saverestores 2
-12
.
-28
.
restored saveRestores
tableObjs 4
-28
.
4
-51
.
4
-62
.
4
-73
.
8
restored Tableobjects
KNX - Individual Adress is set!
KNX - Is Configured
TP is connected
.....progmode on
...
Basic restart requested
save saveRestores 2
save tableobjs   <------ stuck here, only reset helps

Sometimes it goes through, but most of the time not. Also the location where it gets stuck is always slightly different.

Any ideas on how to continue?

thelsing commented 3 weeks ago

I have no Idea, what the problem could be. I would try to add more output in the Memory class and the flash methods of the RP2040 platform. The save code is not that much magic. No idea what could go wrong there.

embed-me commented 1 week ago

Debugging this problem led me down a rabbit hole, however, I think I found the issue.

During registration of callbacks, getGroupObject is called.

GroupObject& getGroupObject(uint16_t goNr)
{
return _bau.groupObjectTable().get(goNr);
}

Then the groupObjectTable object is returned.

GroupObjectTableObject& BauSystemBDevice::groupObjectTable()
{
    return _groupObjTable;
}

Since the input in my example was 0, and an unsigned type is used, the output will be: (uint16_t)0-1 = -1 = 0xffff = 65535

GroupObject& GroupObjectTableObject::get(uint16_t asap)
{
    return _groupObjects[asap - 1]; 
}

Since group objects are raw pointers...

GroupObject* _groupObjects = 0;

... and they reside on the heap, this can't go well once you access your "GroupObject" to register your callback or similar.

bool GroupObjectTableObject::initGroupObjects()
{
    //...
    _groupObjects = new GroupObject[goCount];
    //...

In my case, once I ran through the blocks on the heap during store/free one of the next pointers in the linked list was corrupted and pointed to an address that was NOT 32bit aligned (which the RP4020 requires!) and therefore triggered an exception. The code even contains a check for freeing a block that was never allocated that could have pointed me in the right direction from the very beginning, however, the exception happened before this part of the code 😓

I assume checking goNr in getGroupObject(uint16_t goNr) to avoid such scenarios in the future could be helpful.

thelsing commented 1 week ago

Thank you for you the analysis. As the root cause is solved. I'll close this issue.