Closed kuopinghsu closed 3 years ago
Hi,
I used the same GCC version and compiler flags as SiFive did to get their CoreMark scores (so I had a fair comparison). I remember that it was somewhat compiler version specific at the time. I’ll try and dig out the details for you.
I also used the ‘default’ configuration for biRISC-V: https://github.com/ultraembedded/biriscv/blob/master/docs/configuration.md
I think the compiler flags (that I got from some SiFive repos) were:
CFLAGS+=-O2 -fno-common -funroll-loops -finline-functions -falign-functions=16 -falign-jumps=4 -falign-loops=4 -finline-limit=1000 -fno-if-conversion2 -fselective-scheduling -fno-tree-dominator-opts -fno-reg-struct-return -fno-rename-registers --param case-values-threshold=8 -fno-crossjumping -freorder-blocks-and-partition -fno-tree-loop-if-convert -fno-tree-sink -fgcse-sm -fno-strict-overflow
Thanks for your quick reply. This is the update result of running coremark in TCM. I got CoreMark/MHz: 3.666170.
SystemC 2.3.3-Accellera --- May 18 2021 20:21:23
Copyright (c) 1996-2018 by all Contributors,
ALL RIGHTS RESERVED
Running: ../../../sw/coremark/coremark.elf
Memory: 0x0 - 0x1d26f (Size=116KB) [.text]
Memory: 0x1d270 - 0x1d2ab (Size=0KB) [.eh_frame]
Memory: 0x1d2b0 - 0x1f193 (Size=7KB) [.data]
Memory: 0x1f194 - 0x231e7 (Size=16KB) [.bss]
Info: (I702) default timescale unit used for tracing: 1 ns (sysc_wave.vcd)
2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 1091057
Total time (secs): 0.010911
Iterations/Sec : 366.616960
Iterations : 4
Compiler version : GCC10.2.0
Compiler flags : -O2 -march=rv32im -mabi=ilp32 -nostartfiles -nostdlib -L../common -DPERFORMANCE_RUN=1 -fno-common -funroll-loops -finline-functions -falign-functions=16 -falign-jumps=4 -falign-loops=4 -finline-limit=1000 -fno-if-conversion2 -fselective-scheduling -fno-tree-dominator-opts -fno-reg-struct-return -fno-rename-registers --param case-values-threshold=8 -fno-crossjumping -freorder-blocks-and-partition -fno-tree-loop-if-convert -fno-tree-sink -fgcse-sm -fno-strict-overflow -lc -lm -lgcc -lsys -T ../common/tcm.ld
Memory location : STACK
seedcrc : 0xe9f5
[0]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[0]crcfinal : 0x9f95
Correct operation validated. See README.md for run and reporting rules.
CoreMark 1.0 : 366.616960 / GCC10.2.0 -O2 -march=rv32im -mabi=ilp32 -nostartfiles -nostdlib -L../common -DPERFORMANCE_RUN=1 -fno-common -funroll-loops -finline-functions -falign-functions=16 -falign-jumps=4 -falign-loops=4 -finline-limit=1000 -fno-if-conversion2 -fselective-scheduling -fno-tree-dominator-opts -fno-reg-struct-return -fno-rename-registers --param case-values-threshold=8 -fno-crossjumping -freorder-blocks-and-partition -fno-tree-loop-if-convert -fno-tree-sink -fgcse-sm -fno-strict-overflow -lc -lm -lgcc -lsys -T ../common/tcm.ld / STACK
CoreMark/MHz: 3.666170
TB: Aborted at 13475360 ns
I think it is likely that I used gcc version 7.2.0 (as that is what I have installed). It could be that I used a different version at the time.
I checked gcc version 7.2.0. If it runs in memory, I get 4.059887 CoreMark/MHz, if it runs in TCM, I get 4.142030 CoreMark/MHz. This matches your results. Thanks lot.
SystemC 2.3.3-Accellera --- May 18 2021 20:21:23
Copyright (c) 1996-2018 by all Contributors,
ALL RIGHTS RESERVED
Running: ../../../sw/coremark/coremark.elf
Memory: 0x80000000 - 0x8001d55b (Size=117KB) [.text]
Memory: 0x8001d55c - 0x8001d597 (Size=0KB) [.eh_frame]
Memory: 0x8001d598 - 0x8001f47b (Size=7KB) [.data]
Memory: 0x8001f47c - 0x800234cf (Size=16KB) [.bss]
Info: (I702) default timescale unit used for tracing: 1 ns (sysc_wave.vcd)
2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 985249
Total time (secs): 0.009852
Iterations/Sec : 405.988740
Iterations : 4
Compiler version : GCC7.2.0
Compiler flags : -O2 -march=rv32im -mabi=ilp32 -nostartfiles -nostdlib -L../common -DPERFORMANCE_RUN=1 -fno-common -funroll-loops -finline-functions -falign-functions=16 -falign-jumps=4 -falign-loops=4 -finline-limit=1000 -fno-if-conversion2 -fselective-scheduling -fno-tree-dominator-opts -fno-reg-struct-return -fno-rename-registers --param case-values-threshold=8 -fno-crossjumping -freorder-blocks-and-partition -fno-tree-loop-if-convert -fno-tree-sink -fgcse-sm -fno-strict-overflow -lc -lm -lgcc -lsys -T ../common/default.ld
Memory location : STACK
seedcrc : 0xe9f5
[0]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[0]crcfinal : 0x9f95
Correct operation validated. See README.md for run and reporting rules.
CoreMark 1.0 : 405.988740 / GCC7.2.0 -O2 -march=rv32im -mabi=ilp32 -nostartfiles -nostdlib -L../common -DPERFORMANCE_RUN=1 -fno-common -funroll-loops -finline-functions -falign-functions=16 -falign-jumps=4 -falign-loops=4 -finline-limit=1000 -fno-if-conversion2 -fselective-scheduling -fno-tree-dominator-opts -fno-reg-struct-return -fno-rename-registers --param case-values-threshold=8 -fno-crossjumping -freorder-blocks-and-partition -fno-tree-loop-if-convert -fno-tree-sink -fgcse-sm -fno-strict-overflow -lc -lm -lgcc -lsys -T ../common/default.ld / STACK
CoreMark/MHz: 4.059887
TB: Aborted at 13058020 ns
SystemC 2.3.3-Accellera --- May 18 2021 20:21:23
Copyright (c) 1996-2018 by all Contributors,
ALL RIGHTS RESERVED
Running: ../../../sw/coremark/coremark.elf
Memory: 0x0 - 0x1d55b (Size=117KB) [.text]
Memory: 0x1d55c - 0x1d597 (Size=0KB) [.eh_frame]
Memory: 0x1d598 - 0x1f473 (Size=7KB) [.data]
Memory: 0x1f474 - 0x234c7 (Size=16KB) [.bss]
Info: (I702) default timescale unit used for tracing: 1 ns (sysc_wave.vcd)
2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 965710
Total time (secs): 0.009657
Iterations/Sec : 414.203022
Iterations : 4
Compiler version : GCC7.2.0
Compiler flags : -O2 -march=rv32im -mabi=ilp32 -nostartfiles -nostdlib -L../common -DPERFORMANCE_RUN=1 -fno-common -funroll-loops -finline-functions -falign-functions=16 -falign-jumps=4 -falign-loops=4 -finline-limit=1000 -fno-if-conversion2 -fselective-scheduling -fno-tree-dominator-opts -fno-reg-struct-return -fno-rename-registers --param case-values-threshold=8 -fno-crossjumping -freorder-blocks-and-partition -fno-tree-loop-if-convert -fno-tree-sink -fgcse-sm -fno-strict-overflow -lc -lm -lgcc -lsys -T ../common/tcm.ld
Memory location : STACK
seedcrc : 0xe9f5
[0]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[0]crcfinal : 0x9f95
Correct operation validated. See README.md for run and reporting rules.
CoreMark 1.0 : 414.203022 / GCC7.2.0 -O2 -march=rv32im -mabi=ilp32 -nostartfiles -nostdlib -L../common -DPERFORMANCE_RUN=1 -fno-common -funroll-loops -finline-functions -falign-functions=16 -falign-jumps=4 -falign-loops=4 -finline-limit=1000 -fno-if-conversion2 -fselective-scheduling -fno-tree-dominator-opts -fno-reg-struct-return -fno-rename-registers --param case-values-threshold=8 -fno-crossjumping -freorder-blocks-and-partition -fno-tree-loop-if-convert -fno-tree-sink -fgcse-sm -fno-strict-overflow -lc -lm -lgcc -lsys -T ../common/tcm.ld / STACK
CoreMark/MHz: 4.142030
TB: Aborted at 12255460 ns
Ok, good! (Closing the issue now).
To run dhrystone and coremark, it needs to exceed 64Kb. I modified the code with 128Kb TCM in the following path, and added env to run it.
https://github.com/kuopinghsu/biriscv
I got the following benchmark scores,
Coremark: In memory: CoreMark/MHz: 3.333047 In TCM: CoreMark/MHz: 3.345606
Benchmark: In memory: DMIPS_Per_MHz: 2.243 In TCM: DMIPS_Per_MHz: 2.415
I can't get a score of 4.1 CoreMark/MHz mentioned by the biRISC-V core. Could you share how to get 4.1 CoreMark/MHz score?