nopnop2002 / esp-idf-st7789

ST7789 Driver for esp-idf
MIT License
234 stars 56 forks source link

Improve performance with frame buffer and DMA #23

Open bjorndm opened 3 years ago

bjorndm commented 3 years ago

I enjoy using this library but the performance of drawing could be improved. In stead of drawing directly to the display, I suggest drawing to a memory frame buffer and then sending the frame buffer to the display using DMA. Here is an example of how this could work:

https://esp32.com/viewtopic.php?t=20108

nopnop2002 commented 3 years ago

Thank you for comment.

This example uses a lot of memory to display JPEG and PNG.

To use the framebuffer, I need to consume less memory for JPEG and PNG display.

bjorndm commented 3 years ago

Yes, looking at it in more detail, it is true true that a full screen framebuffer takes too much DRAM.

However, it would also be possible to use small DMA buffers, for example of 32x32 pixels (4k DRAM) to speed up font drawing and other graphic operations. These small buffers could also be used as "sprites" or "tiles". Smaller fonts could then also be loaded as such DMA-accessible sprites.

Perhaps there are other non-DMA techniques to improve performance as well? EDIT: like this? https://ioprog.com/2020/04/11/performance-improvement-for-stm32f030-st7789-graphics-library/

nopnop2002 commented 3 years ago

Perhaps there are other non-DMA techniques to improve performance as well?

Yes.

I know that register operations run about 5 times faster than gpio_set_level.

In the case of ESP32, it is divided into registers from GPIO00 to GPIO31 and registers from GPIO32 to GPIO39. In the case of ESP32-S2, it is divided into registers from GPIO00 to GPIO31 and registers from GPIO32 to GPIO53.

However, it may not have much impact on overall performance.


Try this.

I (13816) MAIN: diff(gpio_set_level)=75
I (13936) MAIN: diff(register)=12
I (14216) MAIN: diff(func)=28
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"

#include "driver/gpio.h"
#include "esp_log.h"

#define _gpio_set_level(GPIO_PIN) (GPIO.out_w1ts = (1 << GPIO_PIN))
#define _gpio_clear_level(GPIO_PIN) (GPIO.out_w1tc = (1 << GPIO_PIN))

void func_gpio_set_level(int GPIO_PIN) {
        GPIO.out_w1ts = (1 << GPIO_PIN);
}

void func_gpio_clear_level(int GPIO_PIN) {
        GPIO.out_w1tc = (1 << GPIO_PIN);
}

#define GPIO_PIN 2

#define TAG "MAIN"

void app_main()
{
        gpio_pad_select_gpio( GPIO_PIN );
        gpio_set_direction( GPIO_PIN, GPIO_MODE_OUTPUT );
        gpio_set_level( GPIO_PIN, 0 );

        gpio_set_level( GPIO_PIN, 1 );
        vTaskDelay(100);
        gpio_set_level( GPIO_PIN, 0 );
        vTaskDelay(100);

        GPIO.out_w1ts = (1 << GPIO_PIN);
        vTaskDelay(100);
        GPIO.out_w1tc = (1 << GPIO_PIN);
        vTaskDelay(100);

        _gpio_set_level( GPIO_PIN );
        vTaskDelay(100);
        _gpio_clear_level( GPIO_PIN );
        vTaskDelay(100);

        TickType_t start;
        TickType_t end;
        TickType_t diff;
        start = xTaskGetTickCount();
        for(long i=0;i<1000000;i++) {
                gpio_set_level( GPIO_PIN, 1 );
                gpio_set_level( GPIO_PIN, 0 );
        }
        end = xTaskGetTickCount();
        diff = end - start;
        ESP_LOGI(TAG,"diff(gpio_set_level)=%d", diff);

        start = xTaskGetTickCount();
        for(long i=0;i<1000000;i++) {
                _gpio_set_level( GPIO_PIN );
                _gpio_clear_level( GPIO_PIN );
        }
        end = xTaskGetTickCount();
        diff = end - start;
        ESP_LOGI(TAG,"diff(register)=%d", diff);

        start = xTaskGetTickCount();
        for(long i=0;i<1000000;i++) {
                func_gpio_set_level( GPIO_PIN );
                func_gpio_clear_level( GPIO_PIN );
        }
        end = xTaskGetTickCount();
        diff = end - start;
        ESP_LOGI(TAG,"diff(func)=%d", diff);
}
bjorndm commented 3 years ago

Ok, I will try it out and if I see a performance improvement, I'll try to apply it to this esp-idf-st7789 project

randyfan commented 1 year ago

Hi, was just wondering if using register operations resulted in a noticeable performance improvement?

This library is awesome, but the only thing holding me back from using it for a project is the refresh rate. I'm trying to get text to refresh without a wasted frame where a black rectangle is drawn over it.

nopnop2002 commented 1 year ago

I'm trying to get text to refresh without a wasted frame where a black rectangle is drawn over it.

I don't know what kind of drawing you want.

randyfan commented 1 year ago

Thanks for the reply. When I use lcdDrawString() with dev->_font_fill enabled, the rectangle drawing method makes it a partial refresh https://github.com/nopnop2002/esp-idf-st7789/blob/master/main/st7789.c#L763, which is cool; however, I can see the frame where the rectangle is drawn over the string. Is there any method that goes straight from one string to another string?

Also, I noticed if I uncomment and use https://github.com/nopnop2002/esp-idf-st7789/blob/master/main/st7789.c#L784 instead of the rectangle drawing method, the rectangle disappears but the refresh becomes noticeably sequential (string characters update from left to right)

Edit: Perhaps I should have posted here instead: https://github.com/nopnop2002/esp-idf-st7789/issues/20. Basically want to see if there's a faster approach than using lcdDrawPixel() and lcdDrawFillRect() for partial refreshes.

nopnop2002 commented 1 year ago

@randyfan

Is there any method that goes straight from one string to another string?

lcdFillScreen(dev, BLACK);
strcpy((char *)ascii, "ABC");
lcdDrawString(dev, fx, xpos, ypos, ascii, WHITE); // Display ABC
vTaskDelay(1000);
lcdDrawString(dev, fx, xpos, ypos, ascii, BLACK); // Erase ABC
strcpy((char *)ascii, "abc");
lcdDrawString(dev, fx, xpos, ypos, ascii, WHITE); // Display abc at same position
DaveDavenport commented 9 months ago

I made a framebuffer version (for esp32s3 I had enough memory to do this) that uses large SPI transfers to redraw the screen in one go (docu indicates it should use dma todo this, atm I still did it blocking). With this I tested up to 15fps redraws and do not notices 'glitches' (there will be some, but rare) on esp.
I tested it with internal memory and SPIRAM. I also changed it to use 18bit (666) colors instead of 16 (565).

If there is interest I can upload this code, its very hacking for now.

nopnop2002 commented 9 months ago

@DaveDavenport

Can you change your repository to public?

DaveDavenport commented 9 months ago

its not on github so no repository to set 'public'. I could share the (in very rough state) code if there is interest.

DaveDavenport commented 9 months ago

I quickly cloned your repo and started adding my code: https://git.sr.ht/~qball/esp-idf-st7789.git What is done:

Things I need to port back:

DaveDavenport commented 9 months ago

output mjpeg (on 25MHz spi bus).

DaveDavenport commented 9 months ago

output

And drawing text on screen without (much) glitching and framebuffer and background in SPIRAM . Backlight dimmed to 20%.

nopnop2002 commented 9 months ago

thank you. I've cloned your code.

I'll take a closer look this weekend.

Conversion to RGB (666) (its 24bit, lower 2 bits are ignored).

Probably ESP32S2/C2 causes memory overflow when displaying JPEG and PNG

DaveDavenport commented 9 months ago
 Probably ESP32S2/C2 causes memory overflow when displaying JPEG and PNG

When not using framebuffer, it should be fine. It is on my todo to remove/reduce the large static buffers in the code.

DaveDavenport commented 9 months ago

Another todo:

DaveDavenport commented 9 months ago

Probably ESP32S2/C2 causes memory overflow when displaying JPEG and PNG

Just tested my branch on an esp32c3 and this works (with and without framebuffer). However I got jpg/png disabled.

image

nopnop2002 commented 9 months ago

Just tested my branch on an esp32c3 and this works

ESP32C3 384 KB ROM 400 KB SRAM

ESP32C2 576 KB ROM 272 KB SRAM ---> too small

ESP32S2 128 KB ROM 320 KB SRAM

DaveDavenport commented 9 months ago

Never used the C2, and the S2 is NRND. Anyway in my patch the framebuffer is optional.

nopnop2002 commented 9 months ago

S2 is NRND

No. S2 is Mass Production. Pls check here. https://products.espressif.com/#/product-selector?names=

DaveDavenport commented 9 months ago

owh good to know, because I really liked it (and keep some in stock). I had some clocking things of peripherals (in combination with dvfs), I did not manage to get working on the c3.

Mouser indicated that espressif marked it NRND, I see now it comes in another form factor.

DaveDavenport commented 9 months ago

For smaller memory usage, we can probably make a smaller framebuffer where we first draw part of what we want to show in the buffer, and then push that in one go to the screen. This should help with text, if we for example push one line of text in one go.

nopnop2002 commented 9 months ago

JPEG and PNG display did not become faster even after changing to FrameBuffer.

This is because image analysis takes time.

Without Frame Buffer(rgb565)

I (2734) FillTest: elapsed time[ms]:1150 I (6784) ColorBarTest: elapsed time[ms]:50 I (11064) ArrowTest: elapsed time[ms]:280 I (17254) LineTest: elapsed time[ms]:2190 I (23194) CircleTest: elapsed time[ms]:1940 I (29174) RoundRectTest: elapsed time[ms]:1980 I (39554) RectAngleTest: elapsed time[ms]:6380 I (50564) TriangleTest: elapsed time[ms]:7010 I (55014) DirectionTest: elapsed time[ms]:450 I (60094) HorizontalTest: elapsed time[ms]:1070 I (65164) VerticalTest: elapsed time[ms]:1070 I (69354) FillRectTest: elapsed time[ms]:190 I (73614) ColorTest: elapsed time[ms]:260 I (78684) CodeTest: elapsed time[ms]:1070 I (84374) CodeTest: elapsed time[ms]:1690 I (95534) BMPTest: elapsed time[ms]:7160 I (102084) JPEGTest: elapsed time[ms]:2550 I (108934) PNGTest: elapsed time[ms]:2850 I (113154) QRTest: elapsed time[ms]:220

With Frame buffer(rgb565)

I (2735) FillTest: elapsed time[ms]:1150 I (6805) ColorBarTest: elapsed time[ms]:70 I (10865) ArrowTest: elapsed time[ms]:60 I (14915) LineTest: elapsed time[ms]:50 I (18975) CircleTest: elapsed time[ms]:60 I (23025) RoundRectTest: elapsed time[ms]:50 I (27095) DirectionTest: elapsed time[ms]:70 I (31165) HorizontalTest: elapsed time[ms]:70 I (35235) VerticalTest: elapsed time[ms]:70 I (39295) FillRectTest: elapsed time[ms]:60 I (43355) ColorTest: elapsed time[ms]:60 I (47455) CodeTest: elapsed time[ms]:100 I (51545) CodeTest: elapsed time[ms]:90 I (62605) BMPTest: elapsed time[ms]:7060 I (69155) JPEGTest: elapsed time[ms]:2550 I (75995) PNGTest: elapsed time[ms]:2840 I (80115) QRTest: elapsed time[ms]:120

DaveDavenport commented 9 months ago

That is to be expected (the big bunny video was another jpeg decoder on non-esp hardware where the drawing was the bottleneck). For me the visible drawing of text was the main reason to update. It looked odd and I could not update all text fast enough.

nopnop2002 commented 9 months ago

If you main purpose is to display text, it's well worth using FrameBuffer.

If your main purpose is to display images, there is no value in using FrameBuffer.

I'll publish it after some more testing.

Thank you.

DaveDavenport commented 9 months ago

If your main purpose is to display images, there is no value in using FrameBuffer.

I think this depends on situation, for me the image updating in one go, instead see it build up while the decoder runs, looks better. There are some use-cases where perceived speed (compared to actual speed) can make a difference. In a small internet radio this helped to give a better experience. image

I have some ideas to improve things more (keep track of exposed region to redraw only needed parts, in the above radio I now have a status bar on top that updates more often. But not sure if/when I have time.

Note: I am getting some oddness in the bmp test if I loop over it repeatedly. Free complains that a block is already free-ed . So I might have mixed something up.. Its all a bit of a rush job in a a few minutes I have here and there.

DaveDavenport commented 9 months ago

Thanks again for your library, its been very useful.