robimarko / openwrt

Linux distribution for embedded devices
https://openwrt.org
Other
401 stars 130 forks source link

performance issue #92

Closed Seromantic closed 1 year ago

Seromantic commented 1 year ago

Xiaomi_ax9000固件似乎只能调用单颗核心,在运行coremark使用top命令查看CPU占用率,封顶25%,ipq8072a全力运行coremark应该有30000分左右,是有意为之还是尚未支持?是因为无法驱动那个pwm风扇而做出的妥协吗?"openssl speed -evp aes-128-gcm"命令测试也有瓶颈。使用最新"ipq807x-5.15-pr"分支代码编译也有这个问题。 本人英语不好,故使用谷歌翻译,如有语法错误,还请见谅。

The Xiaomi_ax9000 firmware seems to only be able to access a single core. When running coremark, use the top command to check the CPU usage, capped at 25%. If the ipq8072a is fully running, the coremark should have a score of about 30,000. Is this intentional or not yet supported? Is it a compromise of not being able to drive that pwm fan? The "openssl speed -evp aes-128-gcm" command test also has a bottleneck. Compiling with the latest "ipq807x-5.15-pr" branch code also has this problem. My English isn't that good, so I use Google Translate, please forgive me if there are grammatical errors.

robimarko commented 1 year ago

Coremark is single core by default, and it's a compile-time option to enable more threads so that is the reason why you are seeing only a single thread. Here is a run with 4 threads:

root@OpenWrt:~# coremark
2K performance run parameters for coremark.
CoreMark Size    : 666
Total ticks      : 14243
Total time (secs): 14.243000
Iterations/Sec   : 30892.368181
Iterations       : 440000
Compiler version : GCC11.3.0
Compiler flags   : -pipe -mcpu=cortex-a53 -fno-caller-saves -fno-plt -fhonour-copts -fmacro-prefix-map=/home/robimarko/Building/AX3600/ipq807x-5.15/build_dir/target-aarch64_cortex-a53_musl/coremark-eefc986ebd3452d6adde22eafaff3e5c859f29e4=coremark-eefc986ebd3452d6adde22eafaff3e5c859f29e4 -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -flto -O3 -DMULTITHREAD=4 -DUSE_PTHREAD  -lrt
Parallel PThreads : 4
Memory location  : Please put data memory location here
            (e.g. code in flash, data on heap etc)
seedcrc          : 0xe9f5
[0]crclist       : 0xe714
[1]crclist       : 0xe714
[2]crclist       : 0xe714
[3]crclist       : 0xe714
[0]crcmatrix     : 0x1fd7
[1]crcmatrix     : 0x1fd7
[2]crcmatrix     : 0x1fd7
[3]crcmatrix     : 0x1fd7
[0]crcstate      : 0x8e3a
[1]crcstate      : 0x8e3a
[2]crcstate      : 0x8e3a
[3]crcstate      : 0x8e3a
[0]crcfinal      : 0x33ff
[1]crcfinal      : 0x33ff
[2]crcfinal      : 0x33ff
[3]crcfinal      : 0x33ff
Correct operation validated. See README.md for run and reporting rules.
CoreMark 1.0 : 30892.368181 / GCC11.3.0 -pipe -mcpu=cortex-a53 -fno-caller-saves -fno-plt -fhonour-copts -fmacro-prefix-map=/home/robimarko/Building/AX3600/ipq807x-5.15/build_dir/target-aarch64_cortex-a53_musl/coremark-eefc986ebd3452d6adde22eafaff3e5c859f29e4=coremark-eefc986ebd3452d6adde22eafaff3e5c859f29e4 -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -flto -O3 -DMULTITHREAD=4 -DUSE_PTHREAD  -lrt / Heap / 4:PThreads
Seromantic commented 1 year ago

多谢关于coremark的解答。那么openssl呢?命令"openssl speed -evp aes-128-gcm"的分数也偏低。需要设置什么参数吗?

Thank you for your answer about coremark. What about openssl? The command "openssl speed -evp aes-128-gcm" also has low scores. What parameters need to be set?

root@AX9000:~# openssl speed -evp aes-128-gcm Doing aes-128-gcm for 3s on 16 size blocks: 11195395 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 64 size blocks: 3698973 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 256 size blocks: 1022651 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 1024 size blocks: 264794 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 8192 size blocks: 33431 aes-128-gcm's in 2.99s Doing aes-128-gcm for 3s on 16384 size blocks: 16718 aes-128-gcm's in 3.00s OpenSSL 1.1.1s 1 Nov 2022 built on: Sat Dec 31 02:19:58 2022 UTC options:bn(64,64) rc4(char) des(int) aes(partial) blowfish(ptr) compiler: aarch64-openwrt-linux-musl-gcc -fPIC -pthread -Wa,--noexecstack -Wall -O3 -Os -pipe -mcpu=cortex-a53 -fno-caller-saves -fno-plt -fhonour-copts -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -DPIC -fPIC -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -DOPENSSL_SMALL_FOOTPRINT The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-128-gcm 59708.77k 78911.42k 87266.22k 90383.02k 91594.23k 91302.57k

robimarko commented 1 year ago

What kind of results are you expecting from OpenSSL?

Seromantic commented 1 year ago

The following is the running score of the Xiaomi oem firmware. The CPU usage is also 25% when running, but the score is much higher. You can try to run the same command with the oem 108 firmware which is used to obtain ssh permissions.

root@XiaoQiang:~# openssl speed -evp aes-128-gcm Doing aes-128-gcm for 3s on 16 size blocks: 19946904 aes-128-gcm's in 2.90s Doing aes-128-gcm for 3s on 64 size blocks: 7941854 aes-128-gcm's in 2.90s Doing aes-128-gcm for 3s on 256 size blocks: 2386071 aes-128-gcm's in 2.91s Doing aes-128-gcm for 3s on 1024 size blocks: 635055 aes-128-gcm's in 2.91s Doing aes-128-gcm for 3s on 8192 size blocks: 80637 aes-128-gcm's in 2.90s OpenSSL 1.0.2q 20 Nov 2018 built on: reproducible build, date unspecified options:bn(64,64) rc4(ptr,char) des(idx,cisc,2,int) aes(partial) blowfish(ptr) compiler: aarch64-openwrt-linux-gcc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/home/jenkins/romdaily_new_openwrt/system/staging_dir/target-aarch64-openwrt-linux_musl/usr/include -I/home/jenkins/romdaily_new_openwrt/system/staging_dir/target-aarch64-openwrt-linux_musl/include -I/home/jenkins/Xiaoqiangtoolchain/toolchain/external_toolchain/toolchain-aarch64_cortex-a53_gcc-5.5.0_musl//usr/include -I/home/jenkins/Xiaoqiangtoolchain/toolchain/external_toolchain/toolchain-aarch64_cortex-a53_gcc-5.5.0_musl//include -specs=/home/jenkins/romdaily_new_openwrt/system/include/hardened-ld-pie.specs -znow -zrelro -DOPENSSL_SMALL_FOOTPRINT -DMIWIFI_FEATURE -DHAVE_CRYPTODEV -DOPENSSL_NO_ERR -DTERMIOS -Os -pipe -march=armv8-a -mcpu=cortex-a53+crypto -fno-caller-saves -Wformat -fpic -fstack-protector -D_FORTIFY_SOURCE=2 -Wl,-z,now -Wl,-z,relro -fpic -I/home/jenkins/romdaily_new_openwrt/system/package/libs/openssl/include -ffunction-sections -fdata-sections -fomit-frame-pointer -Wall -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-gcm 110051.88k 175268.50k 209908.65k 223469.53k 227785.62k

robimarko commented 1 year ago

Those figures are not really comparable as there is no offloading to the NSS crypto FW, only CPU crypto extensions.