Closed owlxiao closed 4 months ago
Thanks for adding arcilator to your benchmarking!
The main problem here is likely that you used different compilers for the following things when benchmarking with arc-tests
:
g++
in the Makefiles in generatesopt
and llc
in the PATH variableThis should be handled better in the arc-tests
repo such that the Makefile automatically uses the same compiler for all these things. You can take a look at the essent-benchmarking
branch in the arc-tests
repo which already contains an improved Makefile. There you can do something like make OPT=opt-18 LLC=llc-18 CXX=clang++-18 run
to use consistent compiler versions. Please make sure to use clang
instead of gcc
and make sure to use clang, opt, and llc from the same LLVM release.
Please let me know if this still doesn't lead to consistent numbers.
It's really cool that you're using arcilator! 🥳 A thought on your benchmark:
Toggling just the clock on your hardware can also skew the results: some simulators may aggressively skip evaluating hardware that hasn't seen any changes, while other simulators don't. If you aren't toggling the input interfaces, chances are that the design drops into a state where a simulator can skip >90% of the work by just realizing that none of the inputs changed.
Pretty sure Verilator does this really well, whereas Arcilator for example doesn't really have such an optimization yet. (It can be added though! That would be really cool 😃!) That's why we try to run an actual benchmark binary in the benchmarks in arc-tests, ensuring that a single work-skipping optimization can't easily skew results.
Hi everyone! I’m glad that the circt community has contributed the arcilator. I’ve been trying to integrate arcilator into my project recently but found that it’s not as fast as advertised.
I've created a project called rtl-sim-benchmark, which is designed to test the speed performance of existing RTL simulators across different benchmarks. Currently, I have integrated arcilator, verilator-1 (single thread), and verilator-2 (multi-thread). In all three test sets (nutshell, riscv-mini, rocketchip), arcilator isn't as well as verilator-1. Here is the automatically generated report from rtl-sim-benchmark.
For the tests, I used this simple approach:
I just count the cycles needed, measure the time from start to end, and collect some performance data. There’s no interaction with any interfaces; it's just a lot of eval() calls.
However, when I ran the arc-tests, the results were unexpectedly different. Arcilator was actually faster than Verilator. What happened here? Could there be something wrong with my tests?