Open epikao opened 1 year ago
You should set the address manually. See the driver as reference: https://github.com/lvgl/lv_port_stm32h745i_disco/blob/88e342bb24e9f84e4f9183d0e6ebbae13506f170/CM7/Core/Src/Lvgl_Porting/lvgl_port_lcd.c#L69-L71
You should set the address manually. See the driver as reference:
That's what I tried, see my code above. Unfortunately, this causes the program to crash or freeze, see attached picture.
In addition I have a problem with the lv_demo_music, see following link: https://forum.lvgl.io/t/lvgl-port-to-stm32h750b-disco/10073
It seems like a missing caching invalidation.
It seems like a missing caching invalidation.
What would be the procedure to solve this? I would appreciate any hints, I'm beginner here..
Thank you
I'm also not an expert but here is a recent discussion about it: https://github.com/lvgl/lvgl/issues/3714#issuecomment-1287261251
@epikao See the sample code for my complete STM32F769I-DISCO LVGL demo.
The key to success is to clean d-cache before starting DMA(2D) transfer.
Note that in this case, after DMA2D transfer, cache does not need to be invalidated because it is being read by LTDC interface which does not use d-cache anyway, only Cortex core does. With that respect, #3714 is a bit different case.
HTH
The key to success is to clean d-cache before starting DMA(2D) transfer.
Ok, hm, but it looks this is already done that way, see code below.
lv_port_stm32h745i_disco/CM7/Core/Src/Lvgl_Porting/lvgl_port_lcd.c
static void disp_flush(lv_disp_drv_t *drv, const lv_area_t *area,
lv_color_t *color_p)
{
/*Return if the area is out the screen*/
if (area->x2 < 0)
return;
if (area->y2 < 0)
return;
if (area->x1 > Lcd_Ctx[LCD_INSTANCE].XSize - 1)
return;
if (area->y1 > Lcd_Ctx[LCD_INSTANCE].YSize - 1)
return;
//BSP_LED_Toggle(LED2);
SCB_CleanInvalidateDCache();
SCB_InvalidateICache();
uint32_t address =
hlcd_ltdc.LayerCfg[Lcd_Ctx[LCD_INSTANCE].ActiveLayer].FBStartAdress
+ (((Lcd_Ctx[LCD_INSTANCE].XSize * area->y1) + area->x1)
* Lcd_Ctx[LCD_INSTANCE].BppFactor);
CopyImageToLcdFrameBuffer((void*) color_p, (void*) address,
lv_area_get_width(area), lv_area_get_height(area));
lv_disp_flush_ready(&disp_drv);
return;
}
True, copying in-memory image area to another memory area, or directly to a device interface, can be done in many ways. E.g.
disp_flush() without using DMA2D
That is why implementation of disp_flush()
method is left to the LVGL user. Choose the method which works for you.
True, copying in-memory image area to another memory area, or directly to a device interface, can be done in many ways. E.g. https://github.com/tdjastrzebski/STM32F769I-DISCO-LVGL/blob/master/master.cpp#L226 That is why implementation of
disp_flush()
method is left to the LVGL user. Choose the method which works for you.
I think your way is pretty the same... so I still not know where from I have the issues mentioned above :-( ...
I do not know either, but calling SCB_InvalidateICache()
probably makes no sense. I
stands for instruction.
SCB_CleanInvalidateDCache()
cleans and invalidates the ENTIRE memory data cache. Probably it does not make sense either. It will only cause big performance hit all over the place.
IF CopyImageToLcdFrameBuffer()
does NOT use DMA(2D) call SCB_CleanDCache_by_Addr()
afterwards.
IF CopyImageToLcdFrameBuffer()
does use DMA(2D) call SCB_CleanDCache_by_Addr()
before.
It is really simple once you understand the cache management concept correctly. Recommended reding: DMA is not working on STM32H7 devices
The core issue is this: DMA(2D) and LTDC interface do not use DCache so you want to make sure physical memory has current byte values after bytes are copied or before - if you copy with DMA. That is why you need to flush (clean) cached values.
thank you for your explanation, I changed as code below, but still the same issue as shown in the link below. https://forum.lvgl.io/t/lvgl-port-to-stm32h750b-disco/10073
Interestingly, I don't have this problem with the other LVGL demos.
static void disp_flush(lv_disp_drv_t *drv, const lv_area_t *area,
lv_color_t *color_p)
{
/*Return if the area is out the screen*/
if (area->x2 < 0)
return;
if (area->y2 < 0)
return;
if (area->x1 > Lcd_Ctx[LCD_INSTANCE].XSize - 1)
return;
if (area->y1 > Lcd_Ctx[LCD_INSTANCE].YSize - 1)
return;
//BSP_LED_Toggle(LED2);
//SCB_CleanInvalidateDCache();
uint32_t bufferLength = Lcd_Ctx[LCD_INSTANCE].XSize * Lcd_Ctx[LCD_INSTANCE].YSize * 16;
SCB_CleanDCache_by_Addr((uint32_t*)color_p, bufferLength);
//SCB_InvalidateICache();
uint32_t address =
hlcd_ltdc.LayerCfg[Lcd_Ctx[LCD_INSTANCE].ActiveLayer].FBStartAdress
+ (((Lcd_Ctx[LCD_INSTANCE].XSize * area->y1) + area->x1)
* Lcd_Ctx[LCD_INSTANCE].BppFactor);
CopyImageToLcdFrameBuffer((void*) color_p, (void*) address,
lv_area_get_width(area), lv_area_get_height(area));
lv_disp_flush_ready(&disp_drv);
return;
}
and here the copy function:
static uint8_t CopyImageToLcdFrameBuffer(void *pSrc, void *pDst, uint32_t xSize,
uint32_t ySize)
{
HAL_StatusTypeDef hal_status = HAL_OK;
uint8_t lcd_status;
/* Configure the DMA2D Mode, Color Mode and output offset */
hlcd_dma2d.Init.Mode = DMA2D_M2M_PFC; //PFC
hlcd_dma2d.Init.ColorMode = DMA2D_OUTPUT_ARGB8888; /* Output color out of PFC */
hlcd_dma2d.Init.AlphaInverted = DMA2D_REGULAR_ALPHA; /* No Output Alpha Inversion*/
hlcd_dma2d.Init.RedBlueSwap = DMA2D_RB_REGULAR; /* No Output Red & Blue swap */
/* Output offset in pixels == nb of pixels to be added at end of line to come to the */
/* first pixel of the next line : on the output side of the DMA2D computation */
hlcd_dma2d.Init.OutputOffset = LCD_DEFAULT_WIDTH - xSize;
hlcd_dma2d.Instance = DMA2D;
/* DMA2D Initialization */
if (HAL_DMA2D_Init(&hlcd_dma2d) == HAL_OK)
{
if (HAL_DMA2D_Start(&hlcd_dma2d, (uint32_t) pSrc, (uint32_t) pDst,
xSize, ySize) == HAL_OK)
{
/* Polling For DMA transfer */
hal_status = HAL_DMA2D_PollForTransfer(&hlcd_dma2d, 20);
if (hal_status == HAL_OK)
{
/* return good status on exit */
lcd_status = BSP_ERROR_NONE;
}
else
{
lcd_status = BSP_ERROR_BUS_DMA_FAILURE;
}
}
}
return (lcd_status);
}
Make sure you pass SCB_CleanDCache_by_Addr() the length of the input buffer, compare with my example.
Make sure you pass SCB_CleanDCache_by_Addr() the length of the input buffer, compare with my example.
However, I port the H745 port to H750... and with except of the internal flash memory there are not much differences, so I'm a little bit surprised that this port and the music player seems to work with H745 but not with H750...
I adjusted the code of H745 port as code see below, but without success. H750 is a very popular IC (value line) but unfortunately there is still no LVGL port...
Adjusted code according F769:
uint32_t width = lv_area_get_width(area);
uint32_t height = lv_area_get_height(area);
uint32_t bufferLength = width * height * 16;
/*Return if the area is out the screen*/
if (area->x2 < 0)
return;
if (area->y2 < 0)
return;
if (area->x1 > Lcd_Ctx[LCD_INSTANCE].XSize - 1)
return;
if (area->y1 > Lcd_Ctx[LCD_INSTANCE].YSize - 1)
return;
//BSP_LED_Toggle(LED2);
//SCB_CleanInvalidateDCache();
//uint32_t bufferLength = Lcd_Ctx[LCD_INSTANCE].XSize * Lcd_Ctx[LCD_INSTANCE].YSize * 16;
SCB_CleanDCache_by_Addr((uint32_t*)color_p, bufferLength);
your code:
uint32_t bufferLength = width * height * LCD_BPP;
uint16_t x = area->x1;
uint16_t y = area->y1;
// copy buffer using DMA2D without Pixel Format Conversion (PFC) or Blending
uint32_t destination = LCD_FB_START_ADDRESS + LCD_BPP * (y * SCREEN_WIDTH + x);
SCB_CleanDCache_by_Addr((uint32_t*)buffer, bufferLength); // flush d-cache to SRAM before starting DMA transfer
see video link below about the strange effect (only with music demo): https://drive.google.com/file/d/1G6J_IgGbrLBJw-goaXh7OrXLqIUoXdb6/view?usp=share_link
uint32_t bufferLength = width * height * 16;
Why do you multiply by 16?
Have you tried simply using both my disp_flush()
implementations?
Why do you multiply by 16?
you do the same, or not? => LCD_BPP I tried 16 and 24bit.... even without it does not change anything..
Have you tried simply using both my
disp_flush()
implementations?
You mean just copy your whole FlushBufferStart function?
you do the same, or not? => LCD_BPP
Not correct, see how LCD_BPP is defined and commented
You mean just copy your whole FlushBufferStart function?
Correct, try both version FlushBufferStart()
versions, with and without DMA. Make sure DMA layer is initialized correctly.
I can give you some hints, which I already did, but I cannot assist you debugging code.
Good luck :)
It seems that only some parts of the screen draw incorrectly so maybe one more thing:
test if (uint32_t*)color_p
and (void*) address
(source and destination address) are always multiple of 4 as required by DMA2D.
That is why I suggest trying both FlushBufferStart()
/ disp_flush()
versions - it may reveal the problem.
your function without using DMA2D works :-)
test if
(uint32_t*)color_p
and(void*) address
(source and destination address) are always multiple of 4 as required by DMA2D.
phuu, how can I test this?
Google it up or find on Stackoverflow.
And the last thing. It would not hurt to make sure your buffer is 32-byte (not bit) aligned (ALIGN_32BYTES()
in my code) to avoid cleaning extra cache row in case of misalignment.
Google it up or find on Stackoverflow. And the last thing. It would not hurt to make sure your buffer is 32-byte (not bit) aligned (
ALIGN_32BYTES()
in my code) to avoid cleaning extra cache row in case of misalignment.
I don't understand much, the buffer addresses are set as following.
And BTW, only your function without DMA2D works fine for me. And I have no idea how to use the ALIGN_32BYTES()
function to align the (uint32_t*)color_p
buffer...
#define LVGL_BUFFER_ADDR_AT_SDRAM (0xD007F810)
#define LVGL_BUFFER_2_ADDR_AT_SDRAM (0xD00FF020)
Just find ALIGN_32BYTES
phrase in my example.
Since I just found an interesting issue in some other code, resulting in a similar screen glitch;
using DMA(2D) make sure any previous transfer has finished before starting a new one or even before modifying settings.
It seems like it is actually possible to start DMA2D setup for the new transfer while the previous one is still running.
That leads to strange visual effects. I am afraid I cannot help more.
Just find
ALIGN_32BYTES
phrase in my example.
just tried, this makes it worse.
I do not have the H745 board, @ahmetalperenbulut , can you confirm whether the music demo runs properly? If so, I do not understand this, because I think the H750 board is identical except for the internal flash memory...
That is probably not causing a problem in your case, but I have to admit that I found a bug in my code and posted a fix yesterday.
If ALIGN_32BYTES
makes it worse than, paradoxically, you may be on the right track. Another thing to check is that buffer(s) length is 32-byte (not bit) aligned as well. That is, calculated length may need to be increased by several extra bytes. H750 should not be much different than F769 I often use - it is still Cortex M7 core. I also did many tests with H7A3.
sorry for the late reply: With the files from https://github.com/lvgl/lv_port_stm32h7b3i_disco I could solve the problem with DMA2D.
Several days ago I have modified my example significantly. I am wondering if it would work as well. I also proposed significant improvements to LVGL dma2d support - see https://github.com/lvgl/lvgl/pull/3904
@tdjastrzebski I'm using your driver code now, and so far it works quite well, except for the framerate.... (Previous I used h7b3i driver code with ALIGN_32BYTES, and same problem) In the video link below you can see that for animations the framerate is only in the 20FPS range. Do you know what could be the reason for this?
https://drive.google.com/file/d/1opsjLjmEKbEB-URVqQ8h3O4TYHjki96O/view?usp=share_link
EDIT: I did not implement your MPU-config and timer call for task_handler and lv_tick (task_handler still in the loop, and lv_tick over systick and MPU-config is from H750 stm examples)
Yes, basically problem is solved, LVGL works with STM32H750 DK now, but it doesn't matter which driver I use, the framerate is always at 7-20FPS (at animation), with h7b3 ported lvgl driver the framerate is between 14 and 20 FPS. Do you think this could be solved with trigger LvglTick and Task by separate timer? Another question is, how can I use SRAM1 through SRAM3 (total 800KByte) as one contigous RAM for LVGL draw buffer?
I would give it a try, but there can be other factors too, e.g. CPU clock speed. DMA2D dos not help with animations, but blending only. I do not know about SRAM config. Does not STM32H750XB have 864k SD RAM in addition to 192k TCM RAM (64k ITCM RAM + 128k DTCM RAM)? But if external 4MB SDRAM is to be used, I am quite certain FMC can be configured the way you need it. I would expect such draw buffer config to be the default one for this board.
ok will try it next week, but I not really believe that this is the problem.
On following video (certified lvgl board, stm32h7b3i) the framerate with the music-demo is always at >30fps https://blog.lvgl.io/2022-12-08/stm32h7b3i-review
max. fCPU STM32H750 is 480MHz, and H7b3i is 280MHz... so I think CPU clock speed can't be the reason... (I use standard/stm example system clock setup)
I do not know about SRAM config. Does not STM32H750XB have 864k SDRAM
Yes, but this 864Kbyte are splitted, not contigous, so my question is how to get it contigous.. I think lvgl draw buffer in internal RAM is faster than on external...
This is an interesting question because I thought of using H750 MPU myself. It seems that LTDC can only use AXI SRAM (512 k) - see RM RM0433, Table 3. Bus-master-to-bus-slave interconnect. SRAM 1-3 cannot be used by LTDC. But why do you need more than 382k RAM for 480×272display? If you do, then maybe FMC can help to shift addresses, but I do not know that. Probably, generally, internal RAM is faster, but I would not bet on it without checking.
SRAM 1-3 cannot be used by LTDC. But why do you need more than 382k RAM for 480×272display?
hmm, are your sure it cannot, see reply in following link from user JPeac.1 ? https://community.st.com/s/question/0D53W00001tMWi9SAG/stm32h750-linkerscript-internal-ram-definition
Reason is, that I want use STM32H750 based on a customized board for bigger displays as for examble 1024x600, or maybe also 1280x800...
So my question is, what is better for high framerate? Big internal AXI SRAM as H7b3i (but lower fCPU) or higher fCPU as H750 but because of low internal AXI SRAM use external SDRAM....
I am not sure, to be sure I would have to test it, but that is what RM0433, p.104, Table 3. Bus-master-to-bus-slave interconnect seems to be clearly saying. See 2.1.6 Bus master peripherals as well. As far as the performance; I do not know, but please post the answer once you find out. If your goal is high res display and high FPU, then maybe one of the new STM32U5 MPUs would be a better choice. See here However, using external SDRAM or MPU with more RAM like H7A3VI may not be a bad choice as well.
STM32U5 MPUs would be a better choice
Yes, I think up to a resolution of 1024x600 and with 16bit color depth the U-variant could make sense, because then you can store the two GUI drawing buffers + the frame buffer all in the internal RAM. But with 32bit pixel format I think you have no chance, there you need an external SDRAM at least for the framebuffer. I guess that the U-variant can only show its advantages in framerate when all buffer incl. framebuffer is also in the internal RAM?...
BTW: Regarding 32bit pixel format - are the last 8bit (alpha) used at all? If not, I think a lot of memory is wasted here just to achieve a higher color depth...
You are correct, the last Alpha byte is not used. I did not bother to optimize it since it is just a demo. For 1024x600 you likely need external RAM, probably unless you use STM32U5 and 16b or indexed color graphics is sufficient. Note that STM32U5 relies on Cortex-M33 core, less powerful than M7. The power of STM32U5 is not only more RAM, it is mainly the GPU2D drawing engine. DMA2D does blending only while GPU2D is capable of image transformations, vector graphics etc.
Hello
Is it possible to move the lvgl display buffer to the internal buffer as code below? What do I have to change more ( because just change this does not work for me)?
thank you