llvm / circt

Circuit IR Compilers and Tools
https://circt.org
Other
1.69k stars 302 forks source link

[Arcilator] Performance Issue #7230

Closed owlxiao closed 4 months ago

owlxiao commented 5 months ago

Hi everyone! I’m glad that the circt community has contributed the arcilator. I’ve been trying to integrate arcilator into my project recently but found that it’s not as fast as advertised.

I've created a project called rtl-sim-benchmark, which is designed to test the speed performance of existing RTL simulators across different benchmarks. Currently, I have integrated arcilator, verilator-1 (single thread), and verilator-2 (multi-thread). In all three test sets (nutshell, riscv-mini, rocketchip), arcilator isn't as well as verilator-1. Here is the automatically generated report from rtl-sim-benchmark.

For the tests, I used this simple approach:

auto Start = std::chrono::system_clock::now();

for (int i = 0; i < Cycles; ++i) {
    dut.set_clock(0);
    dut.eval()

    dut.set_clock(1);
    dut.eval();
}

auto End = std::chrono::system_clock::now();

I just count the cycles needed, measure the time from start to end, and collect some performance data. There’s no interaction with any interfaces; it's just a lot of eval() calls.

However, when I ran the arc-tests, the results were unexpectedly different. Arcilator was actually faster than Verilator. What happened here? Could there be something wrong with my tests?

$ make run
build/small-v1.6/rocket-main ../benchmarks/dhrystone.riscv 
loading segment at 60000000 (virtual address 60000000)
loading segment at 80000000 (virtual address 80000000)
entry 80000000
loaded 20888 program bytes
Microseconds for one run through Dhrystone: 799
Dhrystones per Second:                      1250
mcycle = 399986
minstret = 192528
Benchmark run successful!
----------------------------------------
412458 cycles total
vtor: 54371 Hz
arcs: 527592 Hz
maerhart commented 5 months ago

Thanks for adding arcilator to your benchmarking!

The main problem here is likely that you used different compilers for the following things when benchmarking with arc-tests:

This should be handled better in the arc-tests repo such that the Makefile automatically uses the same compiler for all these things. You can take a look at the essent-benchmarking branch in the arc-tests repo which already contains an improved Makefile. There you can do something like make OPT=opt-18 LLC=llc-18 CXX=clang++-18 run to use consistent compiler versions. Please make sure to use clang instead of gcc and make sure to use clang, opt, and llc from the same LLVM release.

Please let me know if this still doesn't lead to consistent numbers.

fabianschuiki commented 5 months ago

It's really cool that you're using arcilator! 🥳 A thought on your benchmark:

Toggling just the clock on your hardware can also skew the results: some simulators may aggressively skip evaluating hardware that hasn't seen any changes, while other simulators don't. If you aren't toggling the input interfaces, chances are that the design drops into a state where a simulator can skip >90% of the work by just realizing that none of the inputs changed.

Pretty sure Verilator does this really well, whereas Arcilator for example doesn't really have such an optimization yet. (It can be added though! That would be really cool 😃!) That's why we try to run an actual benchmark binary in the benchmarks in arc-tests, ensuring that a single work-skipping optimization can't easily skew results.