raspberrypi / pico-sdk

BSD 3-Clause "New" or "Revised" License
3.65k stars 908 forks source link

Poor RISC-V `pico_riscv_gcc_zcb_zcmp` performance #1937

Open matsobdev opened 1 week ago

matsobdev commented 1 week ago

Now I have a performance issue. Compilers like CORE-V are huge in size, so decided to trim them down a bit. One thing is to leave only default library, by deleting all extra ones inside:

~/corev-openhw-gcc-ubuntu2004-20240530/riscv32-corev-elf/lib
~/corev-openhw-gcc-ubuntu2004-20240530/lib/gcc/riscv32-corev-elf/14.1.0

did a trick. BTW same for ARM.

Today I've got my Pico 2, but noticed something before. Having only default lib, binaries had different size compared to vanilla compiler. Now tested both, and when using pico_riscv_gcc everything's fine.

But for vanilla CORE-V and pico_riscv_gcc_zcb_zcmp it compiles fine, but performance is poor. I have some air mouse from an accelerometer at 6664 Hz, and RISC-V is a bit slower on Pico 2 than M0+ on Pico, but it was still able to keep up with a stream of data and moving average and stuff. Creating monster pico_riscv_gcc_zcb, that is pico_riscv_gcc_zcb_zcmp without _zcmp behaved similar.

Removing extra libraries from CORE-V (same with xPack) resolved an issue. Now both pico_riscv_gcc_zcb_zcmp and pico_riscv_gcc_zcb are performing better than pico_riscv_gcc. It is latest 14.1.0 GCC CORE-V. Default library is rv32imac/ilp32. Avoiding shenanigans with tuning compiler, changing:

set(PICO_COMMON_LANG_FLAGS " -march=rv32ima_zicsr_zifencei_zba_zbb_zbs_zbkb_zca_zcb_zcmp -mabi=ilp32")

to

set(PICO_COMMON_LANG_FLAGS " -march=rv32imac_zicsr_zifencei_zba_zbb_zbs_zbkb_zca_zcb_zcmp -mabi=ilp32")

resolves poor performance issue for pico_riscv_gcc_zcb_zcmp. Same works for xPack.

PS. I propose to create pico_riscv_gcc_zcb. For CORE-V 14.1.0 GCC it performed better than pico_riscv_gcc_zcb_zcmp. First had 2,7% performance boost, latter 0,19% in comparison with pico_riscv_gcc PS2. Noticed extra toolchain release in pico-sdk-tools, I'll give it a try.

ilg-ul commented 6 days ago

Removing extra libraries from CORE-V (same with xPack) resolved an issue.

This is intriguing.

Could you do a verbose link with the xPack toolchain (the official release, with all libraries) and check what library was actually used, then with your trimmed xPack, and compare the results?

matsobdev commented 6 days ago

I'll take care of it for sure later in sunday, but now what I have from the past: on Windows, CORE-V required leaving (with original Pico SDK) rv32ia - pico_riscv_gcc_zcb_zcmp and rv32iac - pico_riscv_gcc. RISC-V required leaving rv32iac - pico_riscv_gcc and since it doesn't have zca that pico_riscv_gcc_zcb wasn't working there. It was determined by try and error of ilp32's. Without them compilation was terminated with error.

Just for the record - what is previous is about Ubuntu 20.04. Last word for now - it seems like it can pick up from multiple matching libraries, whichever is available.

matsobdev commented 6 days ago

Back to vanilla Pico SDK. I don't really know, how to check, which library is picked up, but removed all libraries including a default one in that two directories and xPack has four ilp32's:

rv32i
rv32ia
rv32iac
rv32im

so replacing one by one as a default one, so only default library was present. Compilation was successfully every single time, program was working every time on Pico 2 but inly when rv32im was present, pico_riscv_gcc and pico_riscv_gcc_zcb (vanilla Pico SDK apart from this one, since xPack has no zcmp so far) performance was at a predicted level similar to Pico (it is adding up values for moving average in, atan2f() and division in proportion of 4000/2/2). Obviously, original default rv32imac works and performs good as stated before. I guess it has all the letters to pick up from :D But question is, should it be a specific flavour of a library or can it selectively use ima from imac.

PS. Replacing libraries of xPack with ones from pico-sdk-tools created fastest performing binaries (but still tinier bit slower than Pico at the same clock): pico_riscv_gcc_zcb is faster opprox 0,8% than pico_riscv_gcc. I guess I need update something to run riscv-toolchain-14-x86_64-lin on 20.04 because:

/home/mateush/riscv-toolchain-14-x86_64-lin/bin/riscv32-unknown-elf-gcc: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /home/mateush/riscv-toolchain-14-x86_64-lin/bin/riscv32-unknown-elf-gcc)
/home/mateush/riscv-toolchain-14-x86_64-lin/bin/riscv32-unknown-elf-gcc: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /home/mateush/riscv-toolchain-14-x86_64-lin/bin/riscv32-unknown-elf-gcc)
/home/mateush/riscv-toolchain-14-x86_64-lin/bin/riscv32-unknown-elf-gcc: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /home/mateush/riscv-toolchain-14-x86_64-lin/bin/riscv32-unknown-elf-gcc)

Some conclusion might be adding extra libraries like in Raspberry Pi Pico-series C/C++ SDK, page 30, but adding _zcmp when it is ready. BTW riscv-toolchain-14-x86_64-lin is GCC 14.2.1 but xPack with its libraries works anyway.

ilg-ul commented 6 days ago

I don't really know, how to check, which library is picked up

Try adding -Wl,-v,--verbose to the linker command.

matsobdev commented 5 days ago

Tried add_compile_options(-Wl,-v,--verbose) inside CMakeLists.txt, nothing happened, but according to *.elf.map files:

Files renamed due to Github requirements. Ori (apart from extra pico_riscv_gcc_zcb) Pico SDK and xPack.

PS. Changing:

set(PICO_COMMON_LANG_FLAGS " -march=rv32ima_zicsr_zifencei_zba_zbb_zbs_zbkb_zca_zcb -mabi=ilp32")

to

set(PICO_COMMON_LANG_FLAGS " -march=rv32imac_zicsr_zifencei_zba_zbb_zbs_zbkb_zca_zcb -mabi=ilp32")

inside pico_riscv_gcc_zcb and according to *.elf.map it picks up default rv32imac as well.

PS2. Adding two extra directories with libraries from riscv-toolchain-14-x86_64-lin:

and it still picks up like above. I wonder how behaves proper riscv-toolchain-14-x86_64-lin (if it picks up rv32imac_zicsr_zifencei_zba_zbb_zbs_zbkb when pico_riscv_gcc) and if just copying directories with libraries is sufficient to 'install' it. Originally rv32ima_zicsr_zifencei_zba_zbb_zbs_zbkb_zca_zcb is default one for riscv-toolchain-14-x86_64-lin and copying it as is, replacing all libraries, then in both cases default one is a chosen one.

ilg-ul commented 5 days ago

Tried add_compile_options(-Wl,-v,--verbose) inside CMakeLists.txt nothing happened

Sure it does nothing, since you have to add it to target_link_options().

Otherwise, if you use rv32ima, the resulting code will not use the compressed instructions, the projects will be larger and probably slower.

matsobdev commented 5 days ago

Thanks for CMake tip, was struggling with that one. More I dig into it less I know. But leaving inside xPack rv32ima_zicsr_zifencei_zba_zbb_zbs_zbkb_zca_zcb only as a default and binary size is 46,9 kB for pico_riscv_gcc_zcb, 47,4 kB for pico_riscv_gcc. Leaving only rv32imac_zicsr_zifencei_zba_zbb_zbs_zbkb and 47,0 kB and 47,5 kB. Why there are two of them if both are working and performing good :D Too much for today...

matsobdev commented 5 days ago

Quickest run so far, was CORE-V GCC 14.1.0 with rv32imac_zicsr_zifencei_zba_zbb_zbs_zbkb libraries and pico_riscv_gcc_zcb_zcmp.

Something went wrong.