Closed Seromantic closed 1 year ago
Coremark is single core by default, and it's a compile-time option to enable more threads so that is the reason why you are seeing only a single thread. Here is a run with 4 threads:
root@OpenWrt:~# coremark
2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 14243
Total time (secs): 14.243000
Iterations/Sec : 30892.368181
Iterations : 440000
Compiler version : GCC11.3.0
Compiler flags : -pipe -mcpu=cortex-a53 -fno-caller-saves -fno-plt -fhonour-copts -fmacro-prefix-map=/home/robimarko/Building/AX3600/ipq807x-5.15/build_dir/target-aarch64_cortex-a53_musl/coremark-eefc986ebd3452d6adde22eafaff3e5c859f29e4=coremark-eefc986ebd3452d6adde22eafaff3e5c859f29e4 -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -flto -O3 -DMULTITHREAD=4 -DUSE_PTHREAD -lrt
Parallel PThreads : 4
Memory location : Please put data memory location here
(e.g. code in flash, data on heap etc)
seedcrc : 0xe9f5
[0]crclist : 0xe714
[1]crclist : 0xe714
[2]crclist : 0xe714
[3]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[1]crcmatrix : 0x1fd7
[2]crcmatrix : 0x1fd7
[3]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[1]crcstate : 0x8e3a
[2]crcstate : 0x8e3a
[3]crcstate : 0x8e3a
[0]crcfinal : 0x33ff
[1]crcfinal : 0x33ff
[2]crcfinal : 0x33ff
[3]crcfinal : 0x33ff
Correct operation validated. See README.md for run and reporting rules.
CoreMark 1.0 : 30892.368181 / GCC11.3.0 -pipe -mcpu=cortex-a53 -fno-caller-saves -fno-plt -fhonour-copts -fmacro-prefix-map=/home/robimarko/Building/AX3600/ipq807x-5.15/build_dir/target-aarch64_cortex-a53_musl/coremark-eefc986ebd3452d6adde22eafaff3e5c859f29e4=coremark-eefc986ebd3452d6adde22eafaff3e5c859f29e4 -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -flto -O3 -DMULTITHREAD=4 -DUSE_PTHREAD -lrt / Heap / 4:PThreads
多谢关于coremark的解答。那么openssl呢?命令"openssl speed -evp aes-128-gcm"的分数也偏低。需要设置什么参数吗?
Thank you for your answer about coremark. What about openssl? The command "openssl speed -evp aes-128-gcm" also has low scores. What parameters need to be set?
root@AX9000:~# openssl speed -evp aes-128-gcm Doing aes-128-gcm for 3s on 16 size blocks: 11195395 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 64 size blocks: 3698973 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 256 size blocks: 1022651 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 1024 size blocks: 264794 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 8192 size blocks: 33431 aes-128-gcm's in 2.99s Doing aes-128-gcm for 3s on 16384 size blocks: 16718 aes-128-gcm's in 3.00s OpenSSL 1.1.1s 1 Nov 2022 built on: Sat Dec 31 02:19:58 2022 UTC options:bn(64,64) rc4(char) des(int) aes(partial) blowfish(ptr) compiler: aarch64-openwrt-linux-musl-gcc -fPIC -pthread -Wa,--noexecstack -Wall -O3 -Os -pipe -mcpu=cortex-a53 -fno-caller-saves -fno-plt -fhonour-copts -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -DPIC -fPIC -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -DOPENSSL_SMALL_FOOTPRINT The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-128-gcm 59708.77k 78911.42k 87266.22k 90383.02k 91594.23k 91302.57k
What kind of results are you expecting from OpenSSL?
The following is the running score of the Xiaomi oem firmware. The CPU usage is also 25% when running, but the score is much higher. You can try to run the same command with the oem 108 firmware which is used to obtain ssh permissions.
root@XiaoQiang:~# openssl speed -evp aes-128-gcm Doing aes-128-gcm for 3s on 16 size blocks: 19946904 aes-128-gcm's in 2.90s Doing aes-128-gcm for 3s on 64 size blocks: 7941854 aes-128-gcm's in 2.90s Doing aes-128-gcm for 3s on 256 size blocks: 2386071 aes-128-gcm's in 2.91s Doing aes-128-gcm for 3s on 1024 size blocks: 635055 aes-128-gcm's in 2.91s Doing aes-128-gcm for 3s on 8192 size blocks: 80637 aes-128-gcm's in 2.90s OpenSSL 1.0.2q 20 Nov 2018 built on: reproducible build, date unspecified options:bn(64,64) rc4(ptr,char) des(idx,cisc,2,int) aes(partial) blowfish(ptr) compiler: aarch64-openwrt-linux-gcc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/home/jenkins/romdaily_new_openwrt/system/staging_dir/target-aarch64-openwrt-linux_musl/usr/include -I/home/jenkins/romdaily_new_openwrt/system/staging_dir/target-aarch64-openwrt-linux_musl/include -I/home/jenkins/Xiaoqiangtoolchain/toolchain/external_toolchain/toolchain-aarch64_cortex-a53_gcc-5.5.0_musl//usr/include -I/home/jenkins/Xiaoqiangtoolchain/toolchain/external_toolchain/toolchain-aarch64_cortex-a53_gcc-5.5.0_musl//include -specs=/home/jenkins/romdaily_new_openwrt/system/include/hardened-ld-pie.specs -znow -zrelro -DOPENSSL_SMALL_FOOTPRINT -DMIWIFI_FEATURE -DHAVE_CRYPTODEV -DOPENSSL_NO_ERR -DTERMIOS -Os -pipe -march=armv8-a -mcpu=cortex-a53+crypto -fno-caller-saves -Wformat -fpic -fstack-protector -D_FORTIFY_SOURCE=2 -Wl,-z,now -Wl,-z,relro -fpic -I/home/jenkins/romdaily_new_openwrt/system/package/libs/openssl/include -ffunction-sections -fdata-sections -fomit-frame-pointer -Wall -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-gcm 110051.88k 175268.50k 209908.65k 223469.53k 227785.62k
Those figures are not really comparable as there is no offloading to the NSS crypto FW, only CPU crypto extensions.
Xiaomi_ax9000固件似乎只能调用单颗核心,在运行coremark使用top命令查看CPU占用率,封顶25%,ipq8072a全力运行coremark应该有30000分左右,是有意为之还是尚未支持?是因为无法驱动那个pwm风扇而做出的妥协吗?"openssl speed -evp aes-128-gcm"命令测试也有瓶颈。使用最新"ipq807x-5.15-pr"分支代码编译也有这个问题。 本人英语不好,故使用谷歌翻译,如有语法错误,还请见谅。
The Xiaomi_ax9000 firmware seems to only be able to access a single core. When running coremark, use the top command to check the CPU usage, capped at 25%. If the ipq8072a is fully running, the coremark should have a score of about 30,000. Is this intentional or not yet supported? Is it a compromise of not being able to drive that pwm fan? The "openssl speed -evp aes-128-gcm" command test also has a bottleneck. Compiling with the latest "ipq807x-5.15-pr" branch code also has this problem. My English isn't that good, so I use Google Translate, please forgive me if there are grammatical errors.