vroland / epdiy

EPDiy is a driver board for affordable e-Paper (or E-ink) displays.
https://vroland.github.io/epdiy-hardware/
GNU Lesser General Public License v3.0
1.25k stars 178 forks source link

Use ESP32-S3 vector extensions for LUT processing and diffing #281

Closed vroland closed 2 months ago

vroland commented 4 months ago

Adds optimized versions for 1bpp difference LUT lookup, highlevel framebuffer diffing, and output line masking. Combined with 120MHz PSRAM (activate in the experimental options), we now get sub-second updates for a 1872x1404 display using epdiy V7.

vroland commented 4 months ago

@martinberlin @schuhumi would you mind trying the branch and check if things work for you?

schuhumi commented 4 months ago

@vroland I got it to work! Only my waveform has lots of ghosting now, I'll need to tweak it again.

Does the epd_draw_base() function use the drawn_columns after it returns? I had many problems with artifacts when modifying drawn_columns directly after calling epd_draw_base(), and could only remedy most of it by giving the function a copy of the drawn columns.

Also, are there any speed gains to be expected from using drawn_columns in small updates? At least by eye I couldn't observe any.

martinberlin commented 4 months ago

First test was to take my 9.7" proto-board with v7 and update to this branch. Only change I made was to switch: [*] Make experimental features visible

And PSRAM speed to 120 Mhz. Flashing dragon example I got this via Serial:

I (696) epdiy: using resolution 300x832  -> 1200*825 resolution using default ED097TC2 display
I (706) gpio: GPIO[45]| InputEn: 0| OutputEn: 1| OpenDrain: 0| Pullup: 0| Pulldown: 0| Intr:0 
I (716) epdiy: pclk freq: 22000000 Hz
I (716) epdiy: line width: 14us, 308 cylces
I (726) epdiy: LCD init done.
I (726) epd: Space used for waveform LUT: 64K
assert failed: 0x420078bd
0x420078bd: epd_difference_image_base at /home/martin/Documents/github/epdiy/src/render.c:424 (discriminator 1)

Switching back to the PSRAM speed of 80 Mhz I got exactly the same error. It seems is not the PSRAM speed which is triggering this. Just to make sure it is not trying to send it too fast I downgraded speed in displays.c from 22 to 12 and also have the same error. Do I need any special setting to make this work?

vroland commented 4 months ago

@schuhumi It should only be used when drawing is initiated, I can have a look again.

@martinberlin Indeed, looks like there is some alignment problem with some display resolutions, I'll look into it :)

martinberlin commented 4 months ago

New test with a 6" display: epd_init(&epd_board_v7, &ED060XC3, EPD_LUT_64K);

I (2167) epdiy: highlevel diff area: x: 0, y: 0, w: 1024, h: 768
I (2167) epdiy: starting update, phases: 30
actual draw took 367ms.  // PSRAM at 80 Mhz
actual draw took 361ms.  // PSRAM at 120 Mhz

But independent of PSRAM I cannot see any better performance with this display. Or it's only when you have an existing framebuffer and you update with a new image where I can spot the time difference? @schuhumi can you tell me how you did your test ?

schuhumi commented 3 months ago

@martinberlin Sorry for the late response, life had me quite busy..

So initially I did only try to get it to work, I did not do any benchmarks. But you got me curious, and I just had a closer look. I wrapped my draw_base function this way:

uint32_t t1 = esp_timer_get_time() / 1000;
epd_draw_base(
    epd_full_screen(),
    fb,
    epd_full_screen(),
    MODE_DU | MODE_PACKING_1PPB_DIFFERENCE,
    temperature,
    NULL, // drawn_lines,
    NULL, // drawn columns (only when testing vector extensions)
    epd_get_display()->default_waveform
);
uint32_t t2 = esp_timer_get_time() / 1000;
printf("[<With/No> vector extensions] actual draw took %ldms.\n", t2 - t1);

For a fair comparison I made sure that in both cases:

Without vector extensions:

I (65260) epdiy: starting update, phases: 4
[No Vector extensions] actual draw took 95ms.
I (65490) epdiy: starting update, phases: 4
[No Vector extensions] actual draw took 95ms.
I (65830) epdiy: starting update, phases: 4
[No Vector extensions] actual draw took 95ms.
I (66120) epdiy: starting update, phases: 4
[No Vector extensions] actual draw took 95ms.
I (66860) epdiy: starting update, phases: 4
[No Vector extensions] actual draw took 95ms.
I (67140) epdiy: starting update, phases: 4
[No Vector extensions] actual draw took 95ms.
I (67950) epdiy: starting update, phases: 4
[No Vector extensions] actual draw took 95ms.
I (68230) epdiy: starting update, phases: 4
[No Vector extensions] actual draw took 95ms.

With vector extensions:

[With vector extensions] actual draw took 92ms.
[With vector extensions] actual draw took 93ms.
[With vector extensions] actual draw took 92ms.
[With vector extensions] actual draw took 92ms.
[With vector extensions] actual draw took 92ms.
[With vector extensions] actual draw took 92ms.
[With vector extensions] actual draw took 92ms.
[With vector extensions] actual draw took 93ms.
[With vector extensions] actual draw took 92ms.
[With vector extensions] actual draw took 93ms.

So for me it's hardly any difference too... (this is on the ED133UT2 display)

martinberlin commented 3 months ago

Hello @vroland glad that you are back! Now with this buffer alignment is possible to test it with the 9.7"? Will try this weekend and submit more feedback editing this

Test in main branch: epdiy: highlevel diff area: x: 0, y: 0, w: 1200, h: 825 (9.7" default settings) epdiy: starting update, phases: 30 actual draw took 392ms.

Test in vector-extension branch: epdiy: Using optimized vector implementation on the ESP32-S3, only 1k of 65536 LUT in use! epdiy: diff: 24ms, draw: 373ms, buffer update: 11ms, total: 408ms

Is 20 ms faster which is about 5% speed increase. Would be interesting to test it also with a bigger epaper. Nice optimization!

Now into testing with other display sizes I'm afraid this check won't work for color epapers like WT-F DES or Eink Kaleido:

EpdRect epd_difference_image_base(
    int fb_width, [...] ) {

 printf("fb_width:%d\n",fb_width);
 assert(fb_width % 16 == 0); // --> Not all display widths are module 16

Two examples might be the last 2 definitions you can find in s3_color_implementation branch:

display GDEW101C01: 2232 modulo 16 = 8 display EC060KH3: 1448 modulo 16 = 8

vroland commented 3 months ago

Hi, yes I'm back and un-jetlagged again ;) The branch should now work with the 9.7" display. Afaik, displays with width % 16 != 0 never really worked before either. I think as a workaround we have to virtually increase resolution. I have no such display to test though, with your color display it just worked?

@schuhumi @martinberlin Regarding the speed: With the LCD peripheral the output speed is fixed, and the computation has to keep up with whatever is set. To actually see a faster speed you have to increase the bus speed by calling epd_set_lcd_pixel_clock_MHz(<something>); or modifying displays.c. Then with the vector extensions you should be able to increase it further without seeing errors.

martinberlin commented 3 months ago

Hi @vroland great will try to increase the speed.

Afaik, displays with width % 16 != 0 never really worked before either

About this I can confirm that those 2 models mentioned whose width in fact is not module 16, work perfectly with the main branch but they won’t work with this PR branch. Tested this with GDEW101C01 2232 row pixel width. Even if you comment the assert(with%16 == 0) then it will work but make a ghost image displayed in the X access. Will add a photo later to show my point. So I’m quite sure there is an use case that is missing, since this happens with both of the color filter displays I can test.

schuhumi commented 3 months ago

Oh I see, now that's impressive! I was able to go from 23MHz to 30, at 32 I sometimes get line buffer underruns. That decreases the time for epd_draw_base from 95ms to ~70ms!

vroland commented 2 months ago

@martinberlin I added support for unaligned diffing and LUT lookup, i.e., all displays with width % 8 == 0 should now work. Can you test again?

martinberlin commented 2 months ago

I added support for unaligned diffing and LUT lookup, i.e., all displays with width % 8 == 0 should now work. Can you test again?

This looks very good @vroland I will check it tomorrow with the display's I have including the Kaleido 6" that is .width = 1448 and let you know the results (Timings also with main vs vector branch)

martinberlin commented 2 months ago

Testing with this display Kaleido 6"

const EpdDisplay_t EC060KH5 = {
    .width = 1448,  .height = 1072,
    .bus_width = 8,  .bus_speed = 20,
    .default_waveform = &epdiy_ED097TC2,
    .display_type = DISPLAY_TYPE_GENERIC,
    .display_color_filter = DISPLAY_CFA_KALEIDO
};

Using the main branch this are the timings to draw dragon: I (3119) epdiy: highlevel diff area: x: 0, y: 0, w: 1448, h: 1072 I (3119) epdiy: starting update, phases: 30 actual draw took 664ms.

(Note I have a big dragon that is: 1600x1100 since I made it to test 13.3" displays)

Now merged vector branch into my s3_color_implementation so I later test also color and discard that there is a problem being width % 8 == 0.

epdiy: Using optimized vector implementation on the ESP32-S3, only 1k of 65536 LUT in use! epdiy: diff: 37ms, draw: 645ms, buffer update: 17ms, total: 699ms

Speed is actually quite similar. But now I can tune up the display clock. In both cases PSRAM is at 120Mhz speed. The display now can be set up to 30Mhz that is actually what is on the datasheet (I think)

epdiy: diff: 36ms, draw: 450ms, buffer update: 17ms, total: 503ms

Now the total time is 200 ms faster. Will do later some additional color tests to confirm the color part is still working as expected.

martinberlin commented 2 months ago

IMG_KH5 Kaleido EINK Confirming that Kaleido display works good after your updates with the times shown in last test

vroland commented 2 months ago

Nice, thanks for testing! Once you approve I'll merge.

martinberlin commented 2 months ago

Awesome just checking file by file before approving. Keep this URL to test with your DES color epaper (also %8 width) http://img.cale.es/jpg/fasani/5e5ff140694ee It just delivers a random JPG image. In my branch s3_color_implementation it's now also merged this one so you could also try it at 30 Mhz clock.