uwsampl / lakeroad-evaluation

MIT License
5 stars 0 forks source link

Debug the fact that Verilator is taking a while #117

Closed gussmith23 closed 8 months ago

gussmith23 commented 9 months ago

I used flamegraph, attaching the result here.

flamegraph

It seems like it has to do with the use of threads by Verilator. I bet we could just disable threads and it would help a lot.

gussmith23 commented 9 months ago

I'm really confused why there's any threading activity happening at all! It doesn't really make sense to me.

gussmith23 commented 9 months ago

https://stackoverflow.com/questions/67335512/multithreaded-simulation-orders-of-magnitude-slower-than-single-threaded

This makes me think that it is possible to entirely disable threading, somehow. Am I seeting VL_THREADED somewhere? I feel like I am.

gussmith23 commented 9 months ago

Asking a question here: https://github.com/verilator/verilator/issues/4526

gussmith23 commented 9 months ago

Wilson's suggestions:

I also considered doing everything inside of a Verilog testbench, i.e. not using a C++ testbench. This now runs into the issue of needing to use the --timing flag, which then requires a certain C++ compiler, and now I can't figure out how to make Verilator use Clang...what a mess.

I feel like the easiest solution would just be to figure out how to make the C++ run faster, ideally not using numactl.

gussmith23 commented 9 months ago

The other short term and very non-ideal solution is to just to run fewer simulations.

We could also downgrade Verilator, because I'm pretty sure this was not a problem in previous versions.

gussmith23 commented 9 months ago

Okay, so I put everything in a Verilog testbench and it's much faster. Had to force it to use a more recent compiler with

make -B simulate_new CXX=g++-10 

Makefile lines:

simulate_new: /home/gus/lakeroad-evaluation/out/robustness_experiments/mult_0_stage_signed_8_bit/lakeroad_xilinx_ultrascale_plus/verilator/testbench_new /home/gus/lakeroad-evaluation/out/robustness_experiments/mult_0_stage_signed_8_bit/lakeroad_xilinx_ultrascale_plus/verilator/testbench_inputs.txt
    /home/gus/lakeroad-evaluation/out/robustness_experiments/mult_0_stage_signed_8_bit/lakeroad_xilinx_ultrascale_plus/verilator/testbench_new < /home/gus/lakeroad-evaluation/out/robustness_experiments/mult_0_stage_signed_8_bit/lakeroad_xilinx_ultrascale_plus/verilator/testbench_inputs.txt

/home/gus/lakeroad-evaluation/out/robustness_experiments/mult_0_stage_signed_8_bit/lakeroad_xilinx_ultrascale_plus/verilator/testbench_new: testbench.sv /home/gus/lakeroad-evaluation/robustness-testing-verilog-files/generated/mult_0_stage_signed_8_bit.sv ../lakeroad_result.sv
    $(VERILATOR) --cc --build --exe --timing --main \
      -I/home/gus/lakeroad-evaluation/lakeroad-private/DSP48E2 \
        -DXIL_XECLIB -Wno-UNOPTFLAT -Wno-LATCH -Wno-WIDTH -Wno-STMTDLY -Wno-CASEX -Wno-TIMESCALEMOD -Wno-PINMISSING \
        -CFLAGS -std=c++2a \
    $^
    cp obj_dir/Vtestbench $@
gussmith23 commented 9 months ago

I'm thinking the problem is likely due to the fact that we're remaking the context/module in a loop. I think that's probably slow.

If we go the Verilog route, then we'll use the same module the whole time, which I'm not sure how I feel about. It's probably fine if we're assuming intermediate outputs shouldn't matter/existing state shouldn't matter.

gussmith23 commented 9 months ago

Note that this issue led to another that i'm working on first: https://github.com/uwsampl/lakeroad/issues/372

gussmith23 commented 8 months ago

Done! The eval still runs slowly (see #124) but Verilator is much faster.