Closed gussmith23 closed 1 year ago
I'm really confused why there's any threading activity happening at all! It doesn't really make sense to me.
This makes me think that it is possible to entirely disable threading, somehow. Am I seeting VL_THREADED
somewhere? I feel like I am.
Asking a question here: https://github.com/verilator/verilator/issues/4526
Wilson's suggestions:
I also considered doing everything inside of a Verilog testbench, i.e. not using a C++ testbench. This now runs into the issue of needing to use the --timing
flag, which then requires a certain C++ compiler, and now I can't figure out how to make Verilator use Clang...what a mess.
I feel like the easiest solution would just be to figure out how to make the C++ run faster, ideally not using numactl.
The other short term and very non-ideal solution is to just to run fewer simulations.
We could also downgrade Verilator, because I'm pretty sure this was not a problem in previous versions.
Okay, so I put everything in a Verilog testbench and it's much faster. Had to force it to use a more recent compiler with
make -B simulate_new CXX=g++-10
Makefile lines:
simulate_new: /home/gus/lakeroad-evaluation/out/robustness_experiments/mult_0_stage_signed_8_bit/lakeroad_xilinx_ultrascale_plus/verilator/testbench_new /home/gus/lakeroad-evaluation/out/robustness_experiments/mult_0_stage_signed_8_bit/lakeroad_xilinx_ultrascale_plus/verilator/testbench_inputs.txt
/home/gus/lakeroad-evaluation/out/robustness_experiments/mult_0_stage_signed_8_bit/lakeroad_xilinx_ultrascale_plus/verilator/testbench_new < /home/gus/lakeroad-evaluation/out/robustness_experiments/mult_0_stage_signed_8_bit/lakeroad_xilinx_ultrascale_plus/verilator/testbench_inputs.txt
/home/gus/lakeroad-evaluation/out/robustness_experiments/mult_0_stage_signed_8_bit/lakeroad_xilinx_ultrascale_plus/verilator/testbench_new: testbench.sv /home/gus/lakeroad-evaluation/robustness-testing-verilog-files/generated/mult_0_stage_signed_8_bit.sv ../lakeroad_result.sv
$(VERILATOR) --cc --build --exe --timing --main \
-I/home/gus/lakeroad-evaluation/lakeroad-private/DSP48E2 \
-DXIL_XECLIB -Wno-UNOPTFLAT -Wno-LATCH -Wno-WIDTH -Wno-STMTDLY -Wno-CASEX -Wno-TIMESCALEMOD -Wno-PINMISSING \
-CFLAGS -std=c++2a \
$^
cp obj_dir/Vtestbench $@
I'm thinking the problem is likely due to the fact that we're remaking the context/module in a loop. I think that's probably slow.
If we go the Verilog route, then we'll use the same module the whole time, which I'm not sure how I feel about. It's probably fine if we're assuming intermediate outputs shouldn't matter/existing state shouldn't matter.
Note that this issue led to another that i'm working on first: https://github.com/uwsampl/lakeroad/issues/372
Done! The eval still runs slowly (see #124) but Verilator is much faster.
I used flamegraph, attaching the result here.
It seems like it has to do with the use of threads by Verilator. I bet we could just disable threads and it would help a lot.