yesitsme007 / platform-gd32v-mac-unofficial

GD32V: development platform for PlatformIO
Other
10 stars 3 forks source link

Binaries are ~100x slower #1

Open flashspys opened 4 years ago

flashspys commented 4 years ago

Hey @yesitsme007!

First of all, thanks for taking the trouble to get toolchain running on the Mac.

I noticed that programs compiled with this toolchain run about 100 times slower than programs compiled on Linux/Windows. Can you confirm this?

With both the Arduino and the normal SDK, the delay(10) function takes about 1 second to return instead of 10ms as expected.

Here is what I noticed: The compiler provided by your toolchain is the "riscv64-unknown-elf-gcc (SiFive GCC 8.3.0-2019.08.0) 8.3.0". The Linux and windows versions use the "riscv-nuclei-elf-gcc" in version 9.2. I think you have noticed that nuclei does not offer the compiler in version 9.2 for download. Where the binaries on bintray come from I would be interested to know. So I installed the "riscv64-unknown-elf-gcc (GCC) 9.2.0" on my Mac. My guess was that the compiler is responsible for the speed difference. Unfortunately a binary compiled on the mac with the 9.2 gcc was not faster either. Also replacing the flags with -O2 did not have a visible effect.

So I started to compare the fast firmware produced by linux and the slow one produced by the 9.2 riscv compiler on my mac. In the disassembly you can see that the binaries are visibly different, but the difference in size is only about 10%, I don't really think that's the reason. Nevertheless I still think that the nuclei compiler does something different.

After some days of research I put everything on the table I know. Do you have any clues why my binaries are so slow?

yesitsme007 commented 4 years ago

Hi Frank, Thanks for testing. Honestly I did not do very much testing. I noticed the version difference in the gcc compiler as well but was not able to find a more recent version for MacOS unfortunately. I tried to build one from sources but that did not work at all (likely because of wrong build flags). There is another compiler available. Please have a look at: (https://github.com/sipeed/platform-gd32v/issues/6#issuecomment-561154150) and see the last comments from ivankrets. Could you try this one? Are you sure that you use one and the same SDK version for your tests on Mac and Linux? There was a bug in the firmware causing the delay() function to work incorrectly. See here. This was fixed recently and may not yet be part of the released packages. Please ensure that you use latest version from master. This could explain your findings. It might also be worth repeating your tests with something independent of delay(). If you want to build gcc yourself try the instructions here. The documentation is quite incomplete. Critical is e.g. the --with-arch= flag. I am not sure how this must be set (my problem might have been that I built a 64 bit backend).

GPSBabelDeveloper commented 4 years ago

I don't have any code in this race, but just for drive-by troubleshooting:

Is one perhaps configured for software floating point and another for hardware FP? use gcc -v and watch the link stage to see if one is getting softfp and one hardfp.

Is one configured with a whacked out timer value? Is your delay(1) is sleeping for the wrong amount of wall time? If so, get to the bottom of that first as it indicates a library problem. If the library has the wrong concept of wall time, relying on that in your benchmark will be a problem.

Can you use both versions to compile and run dhrystone? That gives a number that's a gross estimate of integer computing efficiency. It's unlikely you'll see 100x here.

Your report is really too vague for action by a compiler group. A self-contained (small) test case would be really helpful to analyze.