stm32-hotspot / STM32H7-LwIP-Examples

Ethernet examples using LwIP + FreeRTOS for STM32H7 Discovery and Nucleo boards
Other
84 stars 28 forks source link

Cache coherency issue in the STM32H735- DK example (RX_POOL moved to AXI ram outside MPU protected area) #4

Open Johi-b opened 1 year ago

Johi-b commented 1 year ago

See STM32 forum "LWIP/ V6.5.0. STMH32H735_DISCO (https://github.com/stm32-hotspot/STM32H7-LwIP-Examples) .Rx_PoolSection. (former .RxArraySection) seems not to be guarded by the MPU against cache coherency issues. Is this the right way?"

the readme provided stipulates: _“For STM32H72x/H73x devices, the D2 SRAM is more limited (only 32kB). The RX buffers need to be placed in AXI SRAM, since they won't fit to D2 RAM, together with LwIP heap. The LwIP heap is reduced to fit the rest of D2 RAM together with DMA descriptors. Variable STM32H72x/H73x address Size Source file
DMARxDscrTab 0x30000000 96 (256 max.) ethernetif.c
DMATxDscrTab 0x30000100 96 (256 max.) ethernetif.c
memp_memory_RX_POOL_base AXI SRAM (32-byte aligned) 12*(1536 + 24) ethernetif.c
LwIP heap 0x30000200 32232 (32kB - 512 - 24) lwipopts.h_

My issue: Relocating of the RX_POOL to AXI SRAM in the provided example has a result that the MPU is no longer protecting the RX_POOL from cache coherency problems. (=Possible BUG ?) As I understand it, this analysis has been confirmed by exprienced forum members;

I propose the solution to move the Rx_Pool back to D2 change the linker script:

afbeelding

The required .mx settings are: afbeelding

Note: The exact starting point of the LWIP_RAM_HEAP can be deduced from the map file.

afbeelding

In my example: RX_PoolSection stops at 0x30004b84 ie 19332 therefor 13804 bytes remain for LWIP stack. In many cases this is sufficient. The statement “does not fit in ram” seems to be incorrect.

(One must make sure that the size of the HEAP is a multiple of 32 bytes as otherwise #define MEM_SIZE_ALIGNED LWIP_MEM_ALIGN_SIZE(MEM_SIZE) in mem.c causes a round-up of the MEM_SIZE and this can result in addressing area’s above D2_RAM resulting in hardware_fault. Your documentation also correctly indicates that one has to pay attention to this aspect;)

ABESTM commented 1 year ago

Hello,

thank you for pointing this out. Can you please share the link to the STM32 forum? I see only link point to the repository readme.

The cache coherence should be ensured by calling SCB_InvalidateDCache_by_Addr during RX in the ethernetif.c driver. However this might not be 100% bulletproof. It might happen that the CPU decides to free a cache line for the same address as the RX buffer. If this happens just between Ethernet DMA finishing transfer and the driver Invalidating the cache, it could corrupt received data. This scenario is quite unlikely, since it requires precise timing and the CPU to write some data to RX_POOL. However I understand that this can be concern for many applications.

The best solution would be to set the RX_POOL as write-through. When CPU writes to write-through memory, it gets written to cache and memory. So when the cache line needs to be flushed it won't be written back to the memory (since it was done by write-through). The write-through type can still benefit the cache and improve performance, especially during the parsing of incoming packets.

Another solution might be:

Also I think there is no MPU configured for the RX_POOL on STM32H74x/H75x device, so this is intentional in the examples, although not 100% correct.

I will try to address this issue when I update these examples.

Regarding the RX_POOL not fitting into D2 SRAM, I wanted to use at least 32kB. I wasn't able to achieve good results with lower value in iperf benchmark. But it is possible, that the issue was somewhere else.