vroland / epdiy

EPDiy is a driver board for affordable e-Paper (or E-ink) displays.
https://vroland.github.io/epdiy-hardware/
GNU Lesser General Public License v3.0
1.25k stars 178 forks source link

Deinit function issue #269

Open aceisace opened 7 months ago

aceisace commented 7 months ago

Re-initialising after a de-init causes the following error: render.c -> assert(render_context.line_queues[i].buf != NULL); -> assert failed: epd_renderer_init render.c:262 (render_context.line_queues[i].buf != NULL)

My code runs in the following way:

<first init with 1K buffer>
I (10618) esp_temperature: Characterized using eFuse Vref
E (10619) epd: lut size: 1024
free heap: 53 KB (16.56% free)
<download image into epdiy buffer>
<de-init epdiy>
free heap: 168 KB (52.50% free)
I (85649) gpio: GPIO[15]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (87654) esp_temperature: Characterized using eFuse Vref
<second init with 64K buffer>
E (87654) epd: lut size: 65536

Due to internal RAM constrictions, I cannot simply init epdiy with a 64K buffer and then download the image. As a workaround, I had to init with 1K first prior to the image download, then de-init to free up memory before initialising again with 64K.

As epdiy only allows setting the board once, I removed the board setting function from the epd_init function and am setting the board manually beforehand.

May I ask for a clue on what is causing this error? This error has not appeared before using an epdiy version from around last year, but after the upgrade to the latest main branch, this error causes the esp to reboot each time at this exact place.

vroland commented 6 months ago

Hm, apparently there is a race condition where it deinitializes the buffers before stopping the driver threads... Looks like a bug to me.

vroland commented 6 months ago

Could you provide a working example to reproduce the issue?

aceisace commented 6 months ago

Thank you for your feedback @vroland . I had the same hunch, using rtos and threads are useful features, but also prone to race conditions.

This is the relevant snippet:

// setting the board in advance and removing the board argument from epd_init as it would otherwise fail
epd_set_board(&epd_board_v5);
// init epd with 1K buffer. No rendering will be done here, just using the framebuffer to store the image
epd_init(EPD_DISPLAY, EPD_LUT_1K);
hl = epd_hl_init(EPD_BUILTIN_WAVEFORM);
framebuffer = epd_hl_get_framebuffer(&hl);  

<download image into framebuffer, confirmed to work>

// de-init to free internal RAM
epd_deinit();

<image download completed, more internal RAM available>
// we can now init epdiy with 64K as we have more internal RAM after the download
epd_init(EPD_DISPLAY, EPD_LUT_64K); ----> Always fails at this line

// main rendering starts here
epd_poweron();
// As we want to render with 16 grayscales, init epdiy again with 64K 
epd_hl_update_screen(&hl, MODE_GC16, 25);

// rendering complete, de-init epdiy to free RAM and power off
epd_deinit();
epd_poweroff();
aceisace commented 6 months ago

Any progress on this so far? If you need any additional information, please let me know 👍

aceisace commented 5 months ago

@vroland Is it possible to revert this function to a previous state if you don't have any time to look into this bug?

vroland commented 5 months ago

Not that easily, that would be a version before V7 support. Sorry, I haven't had so much time to put towards this lately, but if you want to investigate yourself: There are rendering threads started in render.c that wait for task notifications. These need to exit before the memory deinitialization. Otherwise, you could use an old version (epdiy 1.0.2) until then.

aceisace commented 5 months ago

Thank you very much for your feedback! May I ask you provide a link to the section creating/exiting these treads? As C++ isn't my strength, finding the section responsible for this is pretty difficult. Thank you very much in advance!

vroland commented 4 months ago

I'm onto implementing some tests also for this on my vector instructions branch. You're using the V5 board?

aceisace commented 4 months ago

Thank you very much for taking the time to look into this issue! Yes, I'm using the v5 board!

vroland commented 4 months ago

@aceisace Can you check again with the vector-extension-experimentation branch? Looking at it again it may also be an out of memory issue in your program.

martinberlin commented 4 months ago

Hello Ace can you please give feedback about this?

aceisace commented 4 months ago

@vroland Apologies for the late reply. As I have already switched to an older version since several months, I had to do some refactoring before the new code could be tested. I've had time today to test it, but it still doesn't seem to work:

--init--
free heap: 159 KB (49.69% free)
I (12823) esp_temperature: Characterized using eFuse Vref
I (12825) epd: Space used for waveform LUT: 64K
assert failed: lq_init line_queue.c:23 (queue.bufs != NULL)

Backtrace: 0x40082469:0x3ffc2770 0x400926ed:0x3ffc2790 0x40099ee5:0x3ffc27b0 0x400f7201:0x3ffc28d0 0x400f65a5:0x3ffc2920 0x400f62e5:0x3ffc2970 0x400e2b19:0x3ffc2990 0x400e481a:0x3ffc2aa0 0x400e6dca:0x3ffc30a0 0x40095e01:0x3ffc30c0
0x40082469: panic_abort at /.../esp-idf/esp-idf/components/esp_system/panic.c:408

0x400926ed: esp_system_abort at /.../esp-idf/esp-idf/components/esp_system/esp_system.c:137

0x40099ee5: __assert_func at /.../esp-idf/esp-idf/components/newlib/assert.c:85

0x400f7201: lq_init at /.../esp-idf/v9-neo/components/epdiy/src/output_common/line_queue.c:23 (discriminator 1)

0x400f65a5: epd_renderer_init at /.../esp-idf/v9-neo/components/epdiy/src/render.c:319

0x400f62e5: epd_init at /.../esp-idf/v9-neo/components/epdiy/src/epdiy.c:491

After this it reboots and gets stuck at the same place before rebooting again

amadeok commented 2 months ago

@vroland Apologies for the late reply. As I have already switched to an older version since several months, I had to do some refactoring before the new code could be tested. I've had time today to test it, but it still doesn't seem to work:

--init--
free heap: 159 KB (49.69% free)
I (12823) esp_temperature: Characterized using eFuse Vref
I (12825) epd: Space used for waveform LUT: 64K
assert failed: lq_init line_queue.c:23 (queue.bufs != NULL)

Backtrace: 0x40082469:0x3ffc2770 0x400926ed:0x3ffc2790 0x40099ee5:0x3ffc27b0 0x400f7201:0x3ffc28d0 0x400f65a5:0x3ffc2920 0x400f62e5:0x3ffc2970 0x400e2b19:0x3ffc2990 0x400e481a:0x3ffc2aa0 0x400e6dca:0x3ffc30a0 0x40095e01:0x3ffc30c0
0x40082469: panic_abort at /.../esp-idf/esp-idf/components/esp_system/panic.c:408

0x400926ed: esp_system_abort at /.../esp-idf/esp-idf/components/esp_system/esp_system.c:137

0x40099ee5: __assert_func at /.../esp-idf/esp-idf/components/newlib/assert.c:85

0x400f7201: lq_init at /.../esp-idf/v9-neo/components/epdiy/src/output_common/line_queue.c:23 (discriminator 1)

0x400f65a5: epd_renderer_init at /.../esp-idf/v9-neo/components/epdiy/src/render.c:319

0x400f62e5: epd_init at /.../esp-idf/v9-neo/components/epdiy/src/epdiy.c:491

After this it reboots and gets stuck at the same place before rebooting again

hey, i was having the same assert error message, it turns out the board was just out of internal sram in a project that used both wifi and the epdiy driver. try putting the epd_init() at the start of app_main(), and nothing else, see if it throws the error. if it doesn't start initializing the other things like wifi one at a time, if does throw the error, it could be you have some settings in menuconfig that use too much sram, in that case you could start from the demo example and just start adding what you really need. esp_get_free_internal_heap_size() will give you the remainder free internal sram. use it to see how much you have left at each step

i also can confirm that i was able to use both wifi and epd_init() at the same without problems in v4-v5 firmware, but when i moved to v7, same project, i could no longer since i faced this issue, it seems v7 uses quite a lot more sram.

aceisace commented 2 months ago

@amadeok Thanks for your input, however, as I have been using epdiy for several years now, I recognised the mentioned issue about epd_init() failing when there is too little RAM and have already implemented thorough debugging at each step, logging stack and heap. On the wrover, which has around 320KB of RAM, attempting to init epdiy with less than 30% free stack results in this error, however, in my case it is well above this threshold, so I can confirm it is not related to this problem in any way. Furthermore, as I am using a v5 based board for this test, stack and heap should not be an issue.

The problem is not that I cannot init epdiy, but rather that the deinit function does not undo everything from the init function, leading to the error caused. It boils down to be an issue with the threads not finishing or coming out clean. You can reproduce this yourself by initialising epdiy, then de-initialising and then initialising again.

amadeok commented 2 months ago

@amadeok Thanks for your input, however, as I have been using epdiy for several years now, I recognised the mentioned issue about epd_init() failing when there is too little RAM and have already implemented thorough debugging at each step, logging stack and heap. On the wrover, which has around 320KB of RAM, attempting to init epdiy with less than 30% free stack results in this error, however, in my case it is well above this threshold, so I can confirm it is not related to this problem in any way. Furthermore, as I am using a v5 based board for this test, stack and heap should not be an issue.

The problem is not that I cannot init epdiy, but rather that the deinit function does not undo everything from the init function, leading to the error caused. It boils down to be an issue with the threads not finishing or coming out clean. You can reproduce this yourself by initialising epdiy, then de-initialising and then initialising again.

why are you deinitializing and initializing multiple times? also a cheap solution could be to just restart the board

aceisace commented 2 months ago

So basically, as my setup requires downloading an image from a server (non-jpeg, lossless), I cannot initialise epdiy at the beginning as the download would fail due to lack of RAM, required for SSL, Wi-Fi, temporary buffers and operation buffers. If I try to initialise epdiy and then download the image, it fails.

To tackle this issue, I found a workaround, initialise epdiy with a smaller footprint of internal RAM (mode 1K instead of 64K). That way, 63Kb of RAM of internal RAM are still free, plenty to download and process the image without any memory issues and loading it in the external RAM section used by epdiy (hence the need to initialise). As the download is now complete, I can free up the RAM used by ssl, wi-fi etc., then call deinit to free up internal RAM and allocate the free memory to initialise epdiy again, now with 64K for the highest possible quality.

In either case, the issue in question is that if I initialise epdiy and de-initialise, it should revert back to the state it was in before calling init and cleanup buffers, thread and RAM. However, this is clearly not the case so this is a bug in the library, which should be fixed.

Consider a use-case where a user needs to initialise and de-init epdiy several times to e.g. make use of the faster mono rendering while also supporting high-quality rendering at the cost of speed and more internal RAM.

If I were to restart the board, all the data in the external RAM would be lost since it does not survive reboots. Only RTC or file-based memory is able to retain information after reboots (RTC only for soft reboots, but is very limited in memory). RTC memory does not survive a hard-reboot and file-based memory is way too slow and inefficient.

amadeok commented 2 months ago

it can be reproduced, i went to the demo example and pasted this at the beggining fo app_main()

   while (1)  {

    vTaskDelay(1000 / portTICK_PERIOD_MS);

    epd_init(&DEMO_BOARD, &ED097TC2, EPD_LUT_1K); 

    vTaskDelay(1000 / portTICK_PERIOD_MS);

    epd_deinit();

  ESP_LOGI("SRAM", "esp_get_free_internal_heap_size: %d bytes", esp_get_free_internal_heap_size());

  }

this is the output:

I (25957) epd: Space used for waveform LUT: 1K I (26967) gpio: GPIO[5]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 I (26967) SRAM: esp_get_free_internal_heap_size: 263795 bytes W (27967) epdiy: EPD board can only be set once! I (27967) esp_temperature: Characterized using eFuse Vref

I (27967) epd: Space used for waveform LUT: 1K I (28977) gpio: GPIO[5]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 I (28977) SRAM: esp_get_free_internal_heap_size: 261387 bytes W (29977) epdiy: EPD board can only be set once! I (29977) esp_temperature: Characterized using eFuse Vref

I (29977) epd: Space used for waveform LUT: 1K I (30987) gpio: GPIO[5]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 I (30987) SRAM: esp_get_free_internal_heap_size: 258979 bytes W (31987) epdiy: EPD board can only be set once! I (31987) esp_temperature: Characterized using eFuse Vref

I (31987) epd: Space used for waveform LUT: 1K I (32997) gpio: GPIO[5]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 I (32997) SRAM: esp_get_free_internal_heap_size: 256571 bytes W (33997) epdiy: EPD board can only be set once! I (33997) esp_temperature: Characterized using eFuse Vref

I (33997) epd: Space used for waveform LUT: 1K I (35007) gpio: GPIO[5]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 I (35007) SRAM: esp_get_free_internal_heap_size: 254163 bytes W (36007) epdiy: EPD board can only be set once! I (36007) esp_temperature: Characterized using eFuse Vref

I (36007) epd: Space used for waveform LUT: 1K I (37017) gpio: GPIO[5]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 I (37017) SRAM: esp_get_free_internal_heap_size: 251755 bytes W (38017) epdiy: EPD board can only be set once! I (38017) esp_temperature: Characterized using eFuse Vref

eventually it crashes

aceisace commented 2 months ago

You are using the latest version or at least a much more recent version of epdiy. Without modifying the epd_init function in the more recent version, you cannot reproduce this error. Just split the setting of the board from the other parts in epd_init and you'll be able to get the same result.

For the error mentioned in the first comment, I was using a version back when v6 was the most recent one.

martinberlin commented 2 months ago

Hello @vroland assigned this to you but I would also like to research this issue and help discovering what is the problem. Lately I’m using only v7 and I must say at this point all my v5 boards are gone but still would be great to be able to init and deinit epdiy without having this issue described by Ace

amadeok commented 2 months ago

You are using the latest version or at least a much more recent version of epdiy. Without modifying the epd_init function in the more recent version, you cannot reproduce this error. Just split the setting of the board from the other parts in epd_init and you'll be able to get the same result.

For the error mentioned in the first comment, I was using a version back when v6 was the most recent one.

yes the test i did above was with the latest firmware, v4 board, and there's a memory leak or something going there, as you can see the memory drops by about 3kb at each iteration

i just did the same test with firmware tag v6 from 3 years ago, board v5, and the memory leak is much more severe:

I (3823) gpio: GPIO[15]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 I (3823) SRAM: esp_get_free_internal_heap_size: 252676 bytes I (4827) esp_temperature: Characterized using eFuse Vref

I (5827) gpio: GPIO[15]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 I (5827) SRAM: esp_get_free_internal_heap_size: 203912 bytes I (6831) esp_temperature: Characterized using eFuse Vref

I (7831) gpio: GPIO[15]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 I (7831) SRAM: esp_get_free_internal_heap_size: 155148 bytes I (8835) esp_temperature: Characterized using eFuse Vref

I (9835) gpio: GPIO[15]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 I (9835) SRAM: esp_get_free_internal_heap_size: 106384 bytes I (10839) esp_temperature: Characterized using eFuse Vref

I (11839) gpio: GPIO[15]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 I (11839) SRAM: esp_get_free_internal_heap_size: 57620 bytes I (12843) esp_temperature: Characterized using eFuse Vref

I (13843) gpio: GPIO[15]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 I (13843) SRAM: esp_get_free_internal_heap_size: 47344 bytes I (14847) esp_temperature: Characterized using eFuse Vref

I (15847) gpio: GPIO[15]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 I (15847) SRAM: esp_get_free_internal_heap_size: 37068 bytes I (16851) esp_temperature: Characterized using eFuse Vref

I (17851) gpio: GPIO[15]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 I (17851) SRAM: esp_get_free_internal_heap_size: 26792 bytes I (18855) esp_temperature: Characterized using eFuse Vref

I (19855) gpio: GPIO[15]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 I (19855) SRAM: esp_get_free_internal_heap_size: 16516 bytes I (20859) esp_temperature: Characterized using eFuse Vref

I (21859) gpio: GPIO[15]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 I (21859) SRAM: esp_get_free_internal_heap_size: 6240 bytes I (22863) esp_temperature: Characterized using eFuse Vref

aceisace commented 2 months ago

Indeed @amadeok . It's more than just a memory leak and/or heap/stack poisoning. While I only have access to v5 and v7 from @martinberlin , this issue happens on both boards, on both esp32's and even on the more recent as well as the older version of epdiy. @martinberlin In particular, I believe some of the RTOS operations to be linked to this issue used by the init and more importantly, de-init function. But with my limited understanding and expertise with RTOS and cpp, I cannot figure out more by myself

vroland commented 2 months ago

From the logs you posted it looks a lot like a memory leak. You think it is also something else? I think this is part of why I need to do some refactoring work, I'm not super confident the error and memory management is clean in unexpected paths like continuous initializing / deinitializing. We can have a look if there is something that is easy to find, but the origin of this is a structural issue.

aceisace commented 2 months ago

@vroland Yes, indeed, I think this is not only related to a memory leak, but also RTOS threads in particular after de-init:

assert failed: epd_renderer_init render.c:262 (render_context.line_queues[i].buf != NULL)

Yes, sure, from time to time, a software may need to be refactored and I'm willing to help with testing too. Is there a development branch that is being developed on for the refactoring? I'm sure that some users can help you with refactoring. I can also help with pretty much anything related to python.

vroland commented 2 months ago

No branch yet, but I'm trying to get build some test cases and then continuously improve from there.

aceisace commented 2 months ago

Concerning the test cases, apart from testing if epdiy works with a specific release of esp-idf, wouldn't it make things a lot easier for you and all other contributors if there were unittests that run on each commit via GitHub actions? On Inkycal, it's even possible to simulate a display for running the automated tests which are part of the integration test

vroland commented 2 months ago

That would be the end goal. Unfortunately to do that exhaustively you'd need full emulation in CI or a hardware farm somewhere, which is a lot of complexity. There are esp32 emulators so, it could be worth to look into.

aceisace commented 2 months ago

True enough, a CI will get pretty complex, although it will ensure quality and stability of all core functions throughout all branches. Meanwhile, wouldn't it be easier then to use unittests for all core functions at least which do not require an emulator? Another variant is to have a board just for testing with a test-sketch which does nothing but perform all unittests on-device. This is kind of what kindle did back when it launched.

vroland commented 1 month ago

Yep, now we just need all the unit tests ;) I intend to add them over time during the course of some refactoring, but it will be a process. Then finding a way to make as much of it work in CI as possible is the next step.

western-hoolock commented 1 month ago

Hm, apparently there is a race condition where it deinitializes the buffers before stopping the driver threads... Looks like a bug to me.

render.c -> assert(render_context.line_queues[i].buf != NULL); doesn't necessarily mean a race condition though? It just means the allocation failed.

@aceisace You download an image and then show the heap to be free heap: 168 KB (52.50% free) but that doesn't paint a complete picture. It could very well be that the networking code to download the image fragmented the heap. So while enough free memory is available, there now isn't a single contiguous block large enough left. You can check the largest free block available by calling: heap_caps_get_largest_free_block(MALLOC_CAP_INTERNAL).

vroland commented 1 month ago

@western-hoolock I thought it was the case that some memory was freed while a task on another core could still be running. But here it was just not freed.

western-hoolock commented 4 weeks ago

@western-hoolock I thought it was the case that some memory was freed while a task on another core could still be running. But here it was just not freed.

I'm not convinced that (at least in 2.0.0) it's not freed properly. In epd_renderer_deinit I don't see a code path where this line wouldn't execute if epd_deinit is called? https://github.com/vroland/epdiy/blob/1b089e60a90d861d9a61ce1b3334c3646028013c/src/render.c#L292

My best guess is still heap fragmentation caused by user code that ran before initialization. Which should be easy to check for by calling heap_caps_get_largest_free_block(MALLOC_CAP_INTERNAL) before running said code and then again right before initializing epdiy.