stm32duino / Arduino_Core_STM32

STM32 core support for Arduino
https://github.com/stm32duino/Arduino_Core_STM32/wiki
Other
2.76k stars 962 forks source link

STM32F7 process is too slow. #245

Closed Vicky-S closed 6 years ago

Vicky-S commented 6 years ago

I ran the Wheatstone benchmarking and the result is below

Loops: 1000 Iterations: 1 Duration: 9781 millisec. C Converted Double Precision Whetstones: 10.22 MIPS

Actually STM32F7 is 216MHz processor but it looks like its not running that fast in official core.

Vicky-S commented 6 years ago

Actually this issue was already adressed and sorted out in STM32GENERIC. I shared the link below for your reference.

https://github.com/danieleff/STM32GENERIC/issues/14

Vicky-S commented 6 years ago

Actually its not a complaint. I just want let you know that this type of issue was there in GENERIC and its sorted out so that we can do the same in official core for fast processing.

fpistm commented 6 years ago

FPU is already enabled for F7 so not link to issue raised in STM32GENERIC.

Hereafter, my test result:

STM32GENERIC (using -Os) Loops: 1000 Iterations: 1 Duration: 2947 millisec. C Converted Double Precision Whetstones: 33.93 MIPS

STM Core (using -Os) Loops: 1000 Iterations: 1 Duration: 10084 millisec. C Converted Double Precision Whetstones: 9.92 MIPS

I already know the difference and I've already have this enhancement in one of my local branch for Cortex M7. I 've just push it on the repo: https://github.com/stm32duino/Arduino_Core_STM32/commit/a71ed8de18ba2054d21e2112c57cd1c245c103df

#if (__CORTEX_M == 0x07U) && !defined(UNUSED_ID_CACHE)
// Defined in CMSIS core_cm7.h
  SCB_EnableICache();
  SCB_EnableDCache();
#endif

STM Core (using -Os) + I/D-cache enable Loops: 1000 Iterations: 1 Duration: 2988 millisec. C Converted Double Precision Whetstones: 33.47 MIPS

So, now results are almost the same. Anyway, I don't know how @ChrisMicro had this result 202.84 MIPS

ChrisMicro commented 6 years ago

I stored the benchmark software here, but it was one year ago, so I don't remember exactly: https://github.com/ChrisMicro/ArduinoBenchmarkAllPlatforms/tree/master/WhetStone So simply run it on both frameworks: GENERIC and STM and make a comparison.

fpistm commented 6 years ago

I've used this one https://github.com/stm32duino/STM32Examples/tree/master/examples/Benchmarking/Whetstone And I've run it on both core to compare.

fpistm commented 6 years ago

Ok the difference is @ChrisMicro whetstone version working on float while mine on double (Single vs Double precision) Using float version this is the result for STM Core (using -Os) + I/D-cache enable: Loops: 1000, Iterations: 1, Duration: 409 ms. C Converted Single Precision Whetstones: 244.50 MIPS

So, all is understood and answered. Thanks all

ChrisMicro commented 6 years ago

Yes, it could probably quite inefficient to run double precision benchmarks on a single precision FPU.

fpistm commented 6 years ago

Right, I will provide the Single precision version.

fpistm commented 6 years ago

I've push the Single Precision example: https://github.com/stm32duino/STM32Examples/tree/master/examples/Benchmarking/Whetstone

Result with this version for both core:

STM Core (using -Os) + I/D-cache enable: Loops: 1000 Iterations: 1 Duration: 399 millisec. C Converted Single Precision Whetstones: 250.63 MIPS

STM32GENERIC (using -Os) Loops: 1000 Iterations: 1 Duration: 406 millisec. C Converted Single Precision Whetstones: 246.31 MIPS

Vicky-S commented 6 years ago

STM Core (using -Os) + I/D-cache enable: Loops: 1000 Iterations: 1 Duration: 399 millisec. C Converted Single Precision Whetstones: 250.63 MIPS

Do i need to update the core to get this result?

fpistm commented 6 years ago

This will be in the next release. Currently, it is available in the git repo

Vicky-S commented 6 years ago

If u dont mind can u help me in installing STM32 official core from git repo?? Can u share the steps to do it?

fpistm commented 6 years ago

https://github.com/stm32duino/wiki/wiki/Using-git-repository