lvgl display buffer to internal RAM

epikao commented 1 year ago

Hello

Is it possible to move the lvgl display buffer to the internal buffer as code below? What do I have to change more ( because just change this does not work for me)?

    static lv_color_t buf1_1[480 * 61]; //for internal RAM
    static lv_color_t buf1_2[480 * 61]; //for internal RAM
    lv_disp_draw_buf_init(&disp_buf, buf1_1, buf1_2, Lcd_Ctx[LCD_INSTANCE].XSize * Lcd_Ctx[LCD_INSTANCE].YSize);

thank you

kisvegabor commented 1 year ago

You should set the address manually. See the driver as reference: https://github.com/lvgl/lv_port_stm32h745i_disco/blob/88e342bb24e9f84e4f9183d0e6ebbae13506f170/CM7/Core/Src/Lvgl_Porting/lvgl_port_lcd.c#L69-L71

epikao commented 1 year ago

You should set the address manually. See the driver as reference:

That's what I tried, see my code above. Unfortunately, this causes the program to crash or freeze, see attached picture.

In addition I have a problem with the lv_demo_music, see following link: https://forum.lvgl.io/t/lvgl-port-to-stm32h750b-disco/10073

frozen

kisvegabor commented 1 year ago

It seems like a missing caching invalidation.

epikao commented 1 year ago

It seems like a missing caching invalidation.

What would be the procedure to solve this? I would appreciate any hints, I'm beginner here..

Thank you

kisvegabor commented 1 year ago

I'm also not an expert but here is a recent discussion about it: https://github.com/lvgl/lvgl/issues/3714#issuecomment-1287261251

tdjastrzebski commented 1 year ago

@epikao See the sample code for my complete STM32F769I-DISCO LVGL demo.
The key to success is to clean d-cache before starting DMA(2D) transfer.
Note that in this case, after DMA2D transfer, cache does not need to be invalidated because it is being read by LTDC interface which does not use d-cache anyway, only Cortex core does. With that respect, #3714 is a bit different case. HTH

epikao commented 1 year ago

The key to success is to clean d-cache before starting DMA(2D) transfer.

Ok, hm, but it looks this is already done that way, see code below.

lv_port_stm32h745i_disco/CM7/Core/Src/Lvgl_Porting/lvgl_port_lcd.c

static void disp_flush(lv_disp_drv_t *drv, const lv_area_t *area,
        lv_color_t *color_p)
{
    /*Return if the area is out the screen*/
    if (area->x2 < 0)
        return;
    if (area->y2 < 0)
        return;
    if (area->x1 > Lcd_Ctx[LCD_INSTANCE].XSize - 1)
        return;
    if (area->y1 > Lcd_Ctx[LCD_INSTANCE].YSize - 1)
        return;
    //BSP_LED_Toggle(LED2);
    SCB_CleanInvalidateDCache();
    SCB_InvalidateICache();

    uint32_t address =
            hlcd_ltdc.LayerCfg[Lcd_Ctx[LCD_INSTANCE].ActiveLayer].FBStartAdress
                    + (((Lcd_Ctx[LCD_INSTANCE].XSize * area->y1) + area->x1)
                            * Lcd_Ctx[LCD_INSTANCE].BppFactor);

    CopyImageToLcdFrameBuffer((void*) color_p, (void*) address,
            lv_area_get_width(area), lv_area_get_height(area));

    lv_disp_flush_ready(&disp_drv);
    return;
}

tdjastrzebski commented 1 year ago

True, copying in-memory image area to another memory area, or directly to a device interface, can be done in many ways. E.g.
disp_flush() without using DMA2D
That is why implementation of disp_flush() method is left to the LVGL user. Choose the method which works for you.

epikao commented 1 year ago

True, copying in-memory image area to another memory area, or directly to a device interface, can be done in many ways. E.g. https://github.com/tdjastrzebski/STM32F769I-DISCO-LVGL/blob/master/master.cpp#L226 That is why implementation of disp_flush() method is left to the LVGL user. Choose the method which works for you.

I think your way is pretty the same... so I still not know where from I have the issues mentioned above :-( ...

tdjastrzebski commented 1 year ago

I do not know either, but calling SCB_InvalidateICache() probably makes no sense. I stands for instruction.
SCB_CleanInvalidateDCache() cleans and invalidates the ENTIRE memory data cache. Probably it does not make sense either. It will only cause big performance hit all over the place.
IF CopyImageToLcdFrameBuffer() does NOT use DMA(2D) call SCB_CleanDCache_by_Addr() afterwards.
IF CopyImageToLcdFrameBuffer() does use DMA(2D) call SCB_CleanDCache_by_Addr() before.
It is really simple once you understand the cache management concept correctly. Recommended reding: DMA is not working on STM32H7 devices

tdjastrzebski commented 1 year ago

The core issue is this: DMA(2D) and LTDC interface do not use DCache so you want to make sure physical memory has current byte values after bytes are copied or before - if you copy with DMA. That is why you need to flush (clean) cached values.

epikao commented 1 year ago

thank you for your explanation, I changed as code below, but still the same issue as shown in the link below. https://forum.lvgl.io/t/lvgl-port-to-stm32h750b-disco/10073

Interestingly, I don't have this problem with the other LVGL demos.

static void disp_flush(lv_disp_drv_t *drv, const lv_area_t *area,
        lv_color_t *color_p)
{
    /*Return if the area is out the screen*/
    if (area->x2 < 0)
        return;
    if (area->y2 < 0)
        return;
    if (area->x1 > Lcd_Ctx[LCD_INSTANCE].XSize - 1)
        return;
    if (area->y1 > Lcd_Ctx[LCD_INSTANCE].YSize - 1)
        return;
    //BSP_LED_Toggle(LED2);
    //SCB_CleanInvalidateDCache();

    uint32_t bufferLength = Lcd_Ctx[LCD_INSTANCE].XSize * Lcd_Ctx[LCD_INSTANCE].YSize * 16;
    SCB_CleanDCache_by_Addr((uint32_t*)color_p, bufferLength);

    //SCB_InvalidateICache();

    uint32_t address =
            hlcd_ltdc.LayerCfg[Lcd_Ctx[LCD_INSTANCE].ActiveLayer].FBStartAdress
                    + (((Lcd_Ctx[LCD_INSTANCE].XSize * area->y1) + area->x1)
                            * Lcd_Ctx[LCD_INSTANCE].BppFactor);

    CopyImageToLcdFrameBuffer((void*) color_p, (void*) address,
            lv_area_get_width(area), lv_area_get_height(area));

    lv_disp_flush_ready(&disp_drv);
    return;
}

and here the copy function:

static uint8_t CopyImageToLcdFrameBuffer(void *pSrc, void *pDst, uint32_t xSize,
        uint32_t ySize)
{
    HAL_StatusTypeDef hal_status = HAL_OK;
    uint8_t lcd_status;

    /* Configure the DMA2D Mode, Color Mode and output offset */
    hlcd_dma2d.Init.Mode = DMA2D_M2M_PFC; //PFC
    hlcd_dma2d.Init.ColorMode = DMA2D_OUTPUT_ARGB8888; /* Output color out of PFC */
    hlcd_dma2d.Init.AlphaInverted = DMA2D_REGULAR_ALPHA; /* No Output Alpha Inversion*/
    hlcd_dma2d.Init.RedBlueSwap = DMA2D_RB_REGULAR; /* No Output Red & Blue swap */

    /* Output offset in pixels == nb of pixels to be added at end of line to come to the  */
    /* first pixel of the next line : on the output side of the DMA2D computation         */
    hlcd_dma2d.Init.OutputOffset = LCD_DEFAULT_WIDTH - xSize;

    hlcd_dma2d.Instance = DMA2D;

    /* DMA2D Initialization */
    if (HAL_DMA2D_Init(&hlcd_dma2d) == HAL_OK)
    {
        if (HAL_DMA2D_Start(&hlcd_dma2d, (uint32_t) pSrc, (uint32_t) pDst,
                xSize, ySize) == HAL_OK)
        {
            /* Polling For DMA transfer */
            hal_status = HAL_DMA2D_PollForTransfer(&hlcd_dma2d, 20);
            if (hal_status == HAL_OK)
            {
                /* return good status on exit */
                lcd_status = BSP_ERROR_NONE;
            }
            else
            {
                lcd_status = BSP_ERROR_BUS_DMA_FAILURE;
            }
        }
    }

    return (lcd_status);
}

tdjastrzebski commented 1 year ago

Make sure you pass SCB_CleanDCache_by_Addr() the length of the input buffer, compare with my example.

epikao commented 1 year ago

Make sure you pass SCB_CleanDCache_by_Addr() the length of the input buffer, compare with my example.

However, I port the H745 port to H750... and with except of the internal flash memory there are not much differences, so I'm a little bit surprised that this port and the music player seems to work with H745 but not with H750...

I adjusted the code of H745 port as code see below, but without success. H750 is a very popular IC (value line) but unfortunately there is still no LVGL port...

Adjusted code according F769:

    uint32_t width = lv_area_get_width(area);
    uint32_t height = lv_area_get_height(area);
    uint32_t bufferLength = width * height * 16;
    /*Return if the area is out the screen*/
    if (area->x2 < 0)
        return;
    if (area->y2 < 0)
        return;
    if (area->x1 > Lcd_Ctx[LCD_INSTANCE].XSize - 1)
        return;
    if (area->y1 > Lcd_Ctx[LCD_INSTANCE].YSize - 1)
        return;
    //BSP_LED_Toggle(LED2);
    //SCB_CleanInvalidateDCache();

    //uint32_t bufferLength = Lcd_Ctx[LCD_INSTANCE].XSize * Lcd_Ctx[LCD_INSTANCE].YSize * 16;
    SCB_CleanDCache_by_Addr((uint32_t*)color_p, bufferLength);

your code:

    uint32_t bufferLength = width * height * LCD_BPP;
    uint16_t x = area->x1;
    uint16_t y = area->y1;
    // copy buffer using DMA2D without Pixel Format Conversion (PFC) or Blending
    uint32_t destination = LCD_FB_START_ADDRESS + LCD_BPP * (y * SCREEN_WIDTH + x);
    SCB_CleanDCache_by_Addr((uint32_t*)buffer, bufferLength);  // flush d-cache to SRAM before starting DMA transfer

epikao commented 1 year ago

see video link below about the strange effect (only with music demo): https://drive.google.com/file/d/1G6J_IgGbrLBJw-goaXh7OrXLqIUoXdb6/view?usp=share_link

tdjastrzebski commented 1 year ago

uint32_t bufferLength = width * height * 16;
Why do you multiply by 16?
Have you tried simply using both my disp_flush() implementations?

epikao commented 1 year ago

Why do you multiply by 16?

you do the same, or not? => LCD_BPP I tried 16 and 24bit.... even without it does not change anything..

Have you tried simply using both my disp_flush() implementations?

You mean just copy your whole FlushBufferStart function?

tdjastrzebski commented 1 year ago

you do the same, or not? => LCD_BPP

Not correct, see how LCD_BPP is defined and commented

You mean just copy your whole FlushBufferStart function?

Correct, try both version FlushBufferStart() versions, with and without DMA. Make sure DMA layer is initialized correctly.
I can give you some hints, which I already did, but I cannot assist you debugging code.
Good luck :)

tdjastrzebski commented 1 year ago

It seems that only some parts of the screen draw incorrectly so maybe one more thing:
test if (uint32_t*)color_p and (void*) address (source and destination address) are always multiple of 4 as required by DMA2D.
That is why I suggest trying both FlushBufferStart() / disp_flush() versions - it may reveal the problem.

epikao commented 1 year ago

your function without using DMA2D works :-)

test if (uint32_t*)color_p and (void*) address (source and destination address) are always multiple of 4 as required by DMA2D.

phuu, how can I test this?

tdjastrzebski commented 1 year ago

Google it up or find on Stackoverflow. And the last thing. It would not hurt to make sure your buffer is 32-byte (not bit) aligned (ALIGN_32BYTES() in my code) to avoid cleaning extra cache row in case of misalignment.

epikao commented 1 year ago

Google it up or find on Stackoverflow. And the last thing. It would not hurt to make sure your buffer is 32-byte (not bit) aligned (ALIGN_32BYTES() in my code) to avoid cleaning extra cache row in case of misalignment.

I don't understand much, the buffer addresses are set as following. And BTW, only your function without DMA2D works fine for me. And I have no idea how to use the ALIGN_32BYTES() function to align the (uint32_t*)color_p buffer...

#define LVGL_BUFFER_ADDR_AT_SDRAM   (0xD007F810)
#define LVGL_BUFFER_2_ADDR_AT_SDRAM (0xD00FF020)

tdjastrzebski commented 1 year ago

Just find ALIGN_32BYTES phrase in my example.

tdjastrzebski commented 1 year ago

Since I just found an interesting issue in some other code, resulting in a similar screen glitch;

using DMA(2D) make sure any previous transfer has finished before starting a new one or even before modifying settings.

It seems like it is actually possible to start DMA2D setup for the new transfer while the previous one is still running.
That leads to strange visual effects. I am afraid I cannot help more.

epikao commented 1 year ago

Just find ALIGN_32BYTES phrase in my example.

just tried, this makes it worse.

I do not have the H745 board, @ahmetalperenbulut , can you confirm whether the music demo runs properly? If so, I do not understand this, because I think the H750 board is identical except for the internal flash memory...

tdjastrzebski commented 1 year ago

That is probably not causing a problem in your case, but I have to admit that I found a bug in my code and posted a fix yesterday. If ALIGN_32BYTES makes it worse than, paradoxically, you may be on the right track. Another thing to check is that buffer(s) length is 32-byte (not bit) aligned as well. That is, calculated length may need to be increased by several extra bytes. H750 should not be much different than F769 I often use - it is still Cortex M7 core. I also did many tests with H7A3.

epikao commented 1 year ago

sorry for the late reply: With the files from https://github.com/lvgl/lv_port_stm32h7b3i_disco I could solve the problem with DMA2D.

tdjastrzebski commented 1 year ago

Several days ago I have modified my example significantly. I am wondering if it would work as well. I also proposed significant improvements to LVGL dma2d support - see https://github.com/lvgl/lvgl/pull/3904

epikao commented 1 year ago

@tdjastrzebski I'm using your driver code now, and so far it works quite well, except for the framerate.... (Previous I used h7b3i driver code with ALIGN_32BYTES, and same problem) In the video link below you can see that for animations the framerate is only in the 20FPS range. Do you know what could be the reason for this?

https://drive.google.com/file/d/1opsjLjmEKbEB-URVqQ8h3O4TYHjki96O/view?usp=share_link

EDIT: I did not implement your MPU-config and timer call for task_handler and lv_tick (task_handler still in the loop, and lv_tick over systick and MPU-config is from H750 stm examples)

tdjastrzebski commented 1 year ago

@epikao So the problem is solved? Until quite recently I did not realized myself that the proper LvglTick and LvglTask timers IRQ priority was critical. See here I have tried using MPU for the draw buffer but it seems I am missing something. See here

epikao commented 1 year ago

Yes, basically problem is solved, LVGL works with STM32H750 DK now, but it doesn't matter which driver I use, the framerate is always at 7-20FPS (at animation), with h7b3 ported lvgl driver the framerate is between 14 and 20 FPS. Do you think this could be solved with trigger LvglTick and Task by separate timer? Another question is, how can I use SRAM1 through SRAM3 (total 800KByte) as one contigous RAM for LVGL draw buffer?

tdjastrzebski commented 1 year ago

I would give it a try, but there can be other factors too, e.g. CPU clock speed. DMA2D dos not help with animations, but blending only. I do not know about SRAM config. Does not STM32H750XB have 864k SD RAM in addition to 192k TCM RAM (64k ITCM RAM + 128k DTCM RAM)? But if external 4MB SDRAM is to be used, I am quite certain FMC can be configured the way you need it. I would expect such draw buffer config to be the default one for this board.

epikao commented 1 year ago

ok will try it next week, but I not really believe that this is the problem.

On following video (certified lvgl board, stm32h7b3i) the framerate with the music-demo is always at >30fps https://blog.lvgl.io/2022-12-08/stm32h7b3i-review

max. fCPU STM32H750 is 480MHz, and H7b3i is 280MHz... so I think CPU clock speed can't be the reason... (I use standard/stm example system clock setup)

I do not know about SRAM config. Does not STM32H750XB have 864k SDRAM

Yes, but this 864Kbyte are splitted, not contigous, so my question is how to get it contigous.. I think lvgl draw buffer in internal RAM is faster than on external...

tdjastrzebski commented 1 year ago

This is an interesting question because I thought of using H750 MPU myself. It seems that LTDC can only use AXI SRAM (512 k) - see RM RM0433, Table 3. Bus-master-to-bus-slave interconnect. SRAM 1-3 cannot be used by LTDC. But why do you need more than 382k RAM for 480×272display? If you do, then maybe FMC can help to shift addresses, but I do not know that. Probably, generally, internal RAM is faster, but I would not bet on it without checking.

epikao commented 1 year ago

SRAM 1-3 cannot be used by LTDC. But why do you need more than 382k RAM for 480×272display?

hmm, are your sure it cannot, see reply in following link from user JPeac.1 ? https://community.st.com/s/question/0D53W00001tMWi9SAG/stm32h750-linkerscript-internal-ram-definition

Reason is, that I want use STM32H750 based on a customized board for bigger displays as for examble 1024x600, or maybe also 1280x800...

So my question is, what is better for high framerate? Big internal AXI SRAM as H7b3i (but lower fCPU) or higher fCPU as H750 but because of low internal AXI SRAM use external SDRAM....

tdjastrzebski commented 1 year ago

I am not sure, to be sure I would have to test it, but that is what RM0433, p.104, Table 3. Bus-master-to-bus-slave interconnect seems to be clearly saying. See 2.1.6 Bus master peripherals as well. As far as the performance; I do not know, but please post the answer once you find out. If your goal is high res display and high FPU, then maybe one of the new STM32U5 MPUs would be a better choice. See here However, using external SDRAM or MPU with more RAM like H7A3VI may not be a bad choice as well.

epikao commented 1 year ago

STM32U5 MPUs would be a better choice

Yes, I think up to a resolution of 1024x600 and with 16bit color depth the U-variant could make sense, because then you can store the two GUI drawing buffers + the frame buffer all in the internal RAM. But with 32bit pixel format I think you have no chance, there you need an external SDRAM at least for the framebuffer. I guess that the U-variant can only show its advantages in framerate when all buffer incl. framebuffer is also in the internal RAM?...

BTW: Regarding 32bit pixel format - are the last 8bit (alpha) used at all? If not, I think a lot of memory is wasted here just to achieve a higher color depth...

tdjastrzebski commented 1 year ago

You are correct, the last Alpha byte is not used. I did not bother to optimize it since it is just a demo. For 1024x600 you likely need external RAM, probably unless you use STM32U5 and 16b or indexed color graphics is sufficient. Note that STM32U5 relies on Cortex-M33 core, less powerful than M7. The power of STM32U5 is not only more RAM, it is mainly the GPU2D drawing engine. DMA2D does blending only while GPU2D is capable of image transformations, vector graphics etc.

lvgl / lv_port_stm32h745i_disco

lvgl display buffer to internal RAM #3