m5stack / Core2-for-AWS-IoT-Kit

Accompanying code for use with AWS IoT Kit content. Works with PlatformIO and ESP-IDF v4.2.
https://m5stack.com/collections/m5-core/products/m5stack-core2-esp32-iot-development-kit-for-aws-iot-edukit
MIT License
127 stars 66 forks source link

Memory Allocation Failure in LVGL library in BSP_Dev branch #98

Open pmumby opened 1 year ago

pmumby commented 1 year ago

Summary:

We are experiencing a memory allocation failure in the LVGL library when trying to allocate DMA enabled heap for SPI use. Under normal circumstances this error is non-fatal (without customizing the build config so that it throws a fatal error). The end result is that the UI task hangs, and our UI and touchscreen become completely locked up (all other functions of the device continue without interruption).

Background:

We are implementing this component as a hardware abstraction library for our use of the M5Stack Core2 for an IoT Product. To that end, we've forked this repo to: https://github.com/Flow-Coffee-Limited/Core2-for-AWS-IoT-EduKit

In order to integrate the component into our ESP-IDF 4.4 project, we've created a branch which is pinned to the same commit hash as the template project at: https://github.com/aws-iot-edukit/Project_Template-Core2_for_AWS this branch we've called flow_fgv3_active

We are building using ESP_IDF 4.4 as I mentioned above. We have a dockerized build environment that pre-installs IDF and prerequisites into a docker container, then we use scripts to run the build within the container.

We were on ESP_IDF 4.2, and running older AWS MQTT lib, and older Core2 for AWS lib, unfortunately we had another problem with the MQTT stack which required us to update the MQTT library, which in turn required upgrading to IDF 4.4, and subsequently required upgrading to the BSP_Dev branch of Core2 for AWS lib.

It should be noted that we did not encounter this bug on the previous version of the Core2 for AWS lib, however that was an older version and not the refactored BSP_Dev branch, additionally much else has changed with roll to IDF 4.4, and new MQTT lib, so hardly a clean test, unfortunately these components can't easily be swapped in isolation.

Docker Build Container:

The Docker file builds ESP_IDF branch release/v4.4 and installs all other prerequisites on a base of ubuntu:22.04 (dockerfile attached) esp_idf_dockerfile.zip And here is an example of how to use said docker file: docker run -it -v $(pwd):"/project" esp-idf-build-container $(id -u) $(id -g) /bin/bash Which will provide a bash console inside the container, mounting the current working directory into the /project volume in the container (assuming your current working directory is the root of an IDF project). Alternatively you could substitute /bin/bash for an idf command such as idf.py build since the idf environment is automatically sourced inside the build container.

Stack Trace of the error we're seeing:

(note paths have been sanitized, but everything else is intact)

Memory allocation failed

Backtrace: 0x400823b5:0x3ffddba0 0x400972f9:0x3ffddbc0 0x400d6267:0x3ffddbe0 0x4008270f:0x3ffddc00 0x40082d6a:0x3ffddc20 0x40082e25:0x3ffddc60 0x4008da7a:0x3ffddc80 0x40137b2f:0x3ffddca0 0x400ec5ed:0x3ffddce0 0x400ec376:0x3ffddd50 0x400ebfe1:0x3ffddd90 0x400dd586:0x3ffdddb0 0x400dd673:0x3ffdde20 0x400ddc25:0x3ffddeb0 0x400e6bc5:0x3ffddf00 0x400e6cc8:0x3ffddf20 0x400d8dcd:0x3ffddf50
0x400823b5: panic_abort at /esp/esp-idf-4.4-release/components/esp_system/panic.c:402

0x400972f9: esp_system_abort at /esp/esp-idf-4.4-release/components/esp_system/esp_system.c:128

0x400d6267: heap_caps_match at /esp/esp-idf-4.4-release/components/heap/heap_caps.c:90

0x4008270f: heap_caps_malloc at /esp/esp-idf-4.4-release/components/heap/heap_caps.c:177

0x40082d6a: trace_malloc at /esp/esp-idf-4.4-release/components/heap/include/heap_trace.inc:93

0x40082e25: __wrap_heap_caps_malloc at /esp/esp-idf-4.4-release/components/heap/include/heap_trace.inc:182

0x4008da7a: setup_priv_desc at /esp/esp-idf-4.4-release/components/driver/spi_master.c:771 (discriminator 15)

0x40137b2f: spi_device_queue_trans at /esp/esp-idf-4.4-release/components/driver/spi_master.c:828
 (inlined by) spi_device_queue_trans at /esp/esp-idf-4.4-release/components/driver/spi_master.c:786

0x400ec5ed: disp_spi_transaction at /project/components/Core2-for-AWS-IoT-EduKit/lib/display/lib/lvgl_esp32_drivers/lv_port/disp_spi.c:268

0x400ec376: disp_spi_add_device_config at /project/components/Core2-for-AWS-IoT-EduKit/lib/display/lib/lvgl_esp32_drivers/lv_port/disp_spi.c:106

0x400ebfe1: ili9341_send_data at /project/components/Core2-for-AWS-IoT-EduKit/lib/display/lib/lvgl_esp32_drivers/lvgl_tft/ili9341.c:182

0x400dd586: lv_refr_vdb_flush at /project/components/Core2-for-AWS-IoT-EduKit/lib/display/lib/lvgl/src/lv_core/lv_refr.c:962

0x400dd673: lv_refr_area_part at /project/components/Core2-for-AWS-IoT-EduKit/lib/display/lib/lvgl/src/lv_core/lv_refr.c:507 (discriminator 1)

0x400ddc25: lv_refr_area at /project/components/Core2-for-AWS-IoT-EduKit/lib/display/lib/lvgl/src/lv_core/lv_refr.c:469
 (inlined by) lv_refr_areas at /project/components/Core2-for-AWS-IoT-EduKit/lib/display/lib/lvgl/src/lv_core/lv_refr.c:400
 (inlined by) _lv_disp_refr_task at /project/components/Core2-for-AWS-IoT-EduKit/lib/display/lib/lvgl/src/lv_core/lv_refr.c:199

0x400e6bc5: lv_task_exec at /project/components/Core2-for-AWS-IoT-EduKit/lib/display/lib/lvgl/src/lv_misc/lv_task.c:394
 (inlined by) lv_task_exec at /project/components/Core2-for-AWS-IoT-EduKit/lib/display/lib/lvgl/src/lv_misc/lv_task.c:380

0x400e6cc8: lv_task_handler at /project/components/Core2-for-AWS-IoT-EduKit/lib/display/lib/lvgl/src/lv_misc/lv_task.c:135 (discriminator 1)

0x400d8dcd: _gui_task at /project/components/Core2-for-AWS-IoT-EduKit/lib/display/core2foraws_display.c:83

What have we tried so far?

We have tried the following to troubleshoot/resolve the issue:

Happy to provide any further info as needed.

Any assistance would be greatly appreciated.

rashedtalukder commented 1 year ago

Have you set an increased LV_MEM_SIZE?

pmumby commented 1 year ago

Have you set an increased LV_MEM_SIZE?

Hi Rashed, I have not changed LV_MEM_SIZE from what is set as default in this library (32kb).

Is that size being used for the DMA buffers for SPI though?

If you look in the stack trace of the error, you can see the allocation failure is specifically happening in the code for allocating SPI buffers for the SPI connected display, which require DMA capability.

Obviously DMA is the most constrained memory, but the code runs fine for anywhere from 30min to several hours, with UI working perfectly, then fails (at somewhat random intervals). During this time monitoring heap usage and fragmentation shows that we have between 20kb and 35kb of free DMA memory depending on the circumstances. So if the issue were allocation of 32kb block of DMA memory one would think it would be throwing the error immediately, or at least far more commonly.

For example here is a dump of heap telemetry we are sending from the device. This was taken from a running device that had been online for nearly 1 hour, on which the UI had not yet locked up (it had not yet thrown the allocation exception):

    "heap": {
        "internal": {
          "allocatedBlocks": 428,
          "totalFreeBytes": 40027,
          "minimumFreeBytes": 16199,
          "largestFreeBlock": 19456,
          "freeBlocks": 17,
          "totalBlocks": 445
          "freeBlocks": 16,
          "totalBlocks": 444
        },
        "dram": {
          "allocatedBlocks": 553,
          "totalFreeBytes": 3614854,
          "minimumFreeBytes": 3566798,
          "largestFreeBlock": 3538944,
          "freeBlocks": 25,
          "totalBlocks": 578
          "freeBlocks": 24,
          "totalBlocks": 577
        },
        "iram": {
          "allocatedBlocks": 0,
          "totalFreeBytes": 0,
          "minimumFreeBytes": 0,
          "largestFreeBlock": 0,
          "freeBlocks": 0,
          "totalBlocks": 0
        },
        "dma": {
          "allocatedBlocks": 428,
          "totalFreeBytes": 38435,
          "minimumFreeBytes": 14623,
          "largestFreeBlock": 19456,
          "freeBlocks": 16,
          "totalBlocks": 444
          "freeBlocks": 15,
          "totalBlocks": 443
        },
        "spiram": {
          "allocatedBlocks": 125,
          "totalFreeBytes": 3576419,
          "minimumFreeBytes": 3552175,
          "largestFreeBlock": 3538944,
          "freeBlocks": 9,
          "totalBlocks": 134
        }
      },