openhwgroup / cvw

CORE-V Wally is a configurable RISC-V Processor associated with RISC-V System-on-Chip Design textbook. Contains a 5-stage pipeline, support for A, B, C, D, F, M and Q extensions, and optional caches, BP, FPU, VM/MMU, AHB, RAMs, and peripherals.
Other
269 stars 186 forks source link

Verilator Performance Issue #650

Closed davidharrishmc closed 7 months ago

davidharrishmc commented 8 months ago

Verilator runs regression tests, but is about 50x slower than Questa right now. We expect comparable speeds, so there is an issue with something in the cvw coding.

From the sim directory, run verilator on rv64i with

verilator -GTEST="\"arch64i\"" --timescale "1ns/1ns" --timing --binary --top-module testbench "-I../config/shared" "-I../config/rv64gc" ../src/cvw.sv ../testbench/testbench.sv ../testbench/common/*.sv ../src/*/*.sv ../src/*/*/*.sv --relative-includes
time ./obj_dir/Vtestbench
TEST is arch64i
rv64i_m/I/src/add-01.S succeeded.  Brilliant!!!
...
rv64i_m/I/src/xori-01.S succeeded.  Brilliant!!!
SUCCESS! All tests ran without failures.
%Error: ../testbench/testbench.sv:360: Verilog $stop
Aborting...
Aborted (core dumped)

real    8m38.408s
user    8m38.230s
sys     0m0.129s

To run verilator with profiling: Change $stop to $finish in Verilog

verilator -GTEST="\"arch64i\"" --prof-cfuncs --timescale "1ns/1ns" --timing --binary --top-module testbench "-I../config/shared" "-I../config/rv64gc" ../src/cvw.sv ../testbench/testbench.sv ../testbench/common/*.sv ../src/*/*.sv ../src/*/*/*.sv --relative-includes
time ./obj_dir/Vtestbench
...
rv64i_m/I/src/xori-01.S succeeded.  Brilliant!!!
SUCCESS! All tests ran without failures.
%Error: ../testbench/testbench.sv:360: Verilog $stop
Aborting...
Aborted (core dumped)

real    22m17.142s
user    22m16.959s
sys     0m0.116s

gprof ./obj_dir/Vtestbench gmon.out > gmon.log
verilator_profcfunc gmon.log > gmon.log2
more gmon.log2

Overall summary by type:
  % time  type
    9.01  C++
   20.76  Common code under Vtestbench
    0.14  VLib
  590.85  Verilog Blocks under Vtestbench
  -520.76  Unaccounted for/rounding error
Overall summary by design:
  % time  design
    9.01  C++
    0.14  VLib
  611.61  Vtestbench
  -520.76  Unaccounted for/rounding error

590% is clearly a profiling error. The most time that could be used is 100%.

Verilator on chips.eng.hmc.edu has been built with clang (./configure CXX=clang++) and gprof has been rebuilt.

May need to change $stop to $finish in the testbench to generate gmon.out. Try removing the #delays or using --no-timing

See also the issue #4858 filed with Verilator: https://github.com/verilator/verilator/issues/4858

There’s some documentation under Code and Execution Profiling at https://verilator.org/guide/latest/simulating.html#execution-profiling

It is also possible but slow to run verilator on all configurations with ./verilate

davidharrishmc commented 8 months ago

Try Verilator with --no-timing. Got problems with the test bench generating clock and reset. How does Verilator recommend doing this?

verilator -GTEST="\"arch64i\"" --no-timing --binary --top-module testbench "-I../config/shared" "-I../config
/rv64gc" ../src/cvw.sv ../testbench/testbench.sv ../testbench/common/*.sv ../src/*/*.sv ../src/*/*/*.sv --relative-includes
%Warning-STMTDLY: ../testbench/testbench.sv:233:5: Ignoring delay on this statement due to --no-timing
                                                 : ... note: In instance 'testbench'
  233 |     # 100;
      |     ^
                  ... For warning description see https://verilator.org/warn/STMTDLY?v=5.021
                  ... Use "/* verilator lint_off STMTDLY */" and lint_on around source to disable this message.
%Warning-STMTDLY: ../testbench/testbench.sv:488:14: Ignoring delay on this statement due to --no-timing
                                                  : ... note: In instance 'testbench'
  488 |     clk = 1; # 5; clk = 0; # 5;
      |              ^
%Warning-STMTDLY: ../testbench/testbench.sv:488:28: Ignoring delay on this statement due to --no-timing
                                                  : ... note: In instance 'testbench'
  488 |     clk = 1; # 5; clk = 0; # 5;
      |                            ^
%Error: Exiting due to 3 warning(s)
Karl-Han commented 8 months ago

I get an idea to fix this problem, but I have problem with the code and get verilator running as I wish.

The idea is that you are checking whether you should load memory or not every posedge clk, which leads to overhead that is unncessary. I think it is a good idea to put this part of code into a intial block.

A not-working patch is attached:

diff --git a/testbench/testbench.sv b/testbench/testbench.sv
index 0f8194e62..f3c04fa7a 100644
--- a/testbench/testbench.sv
+++ b/testbench/testbench.sv
@@ -195,6 +195,90 @@ module testbench;
       $display("TEST %s not supported in this configuration", TEST);
       $finish;
     end
+    else begin
+      test = 1;
+      wait(TestBenchReset == 1'b0);
+      while (test != tests.size()) begin
+        $display("test = %d", test);
+        $display("Waiting for SelectTest");
+        wait(CurrState == STATE_INIT_TEST);
+        $display("SelectTest arrives");
+        @(negedge clk);
+        if (riscofTest) memfilename = {pathname, tests[test], "/ref/ref.elf.memfile"};
+        else if(TEST == "buildroot") begin 
+          memfilename = {RISCV_DIR, "/linux-testvectors/ram.bin"};
+          bootmemfilename = {RISCV_DIR, "/linux-testvectors/bootmem.bin"};
+        end
+        else            memfilename = {pathname, tests[test], ".elf.memfile"};
+        if (riscofTest) begin
+          ProgramAddrMapFile = {pathname, tests[test], "/ref/ref.elf.objdump.addr"};
+          ProgramLabelMapFile = {pathname, tests[test], "/ref/ref.elf.objdump.lab"};
+        end else if (TEST == "buildroot") begin
+          ProgramAddrMapFile = {RISCV_DIR, "/buildroot/output/images/disassembly/vmlinux.objdump.addr"};
+          ProgramLabelMapFile = {RISCV_DIR, "/buildroot/output/images/disassembly/vmlinux.objdump.lab"};
+        end else begin
+          ProgramAddrMapFile = {pathname, tests[test], ".elf.objdump.addr"};
+          ProgramLabelMapFile = {pathname, tests[test], ".elf.objdump.lab"};
+        end
+        // declare memory labels that interest us, the updateProgramAddrLabelArray task will find 
+        // the addr of each label and fill the array. To expand, add more elements to this array 
+        // and initialize them to zero (also initilaize them to zero at the start of the next test)
+        updateProgramAddrLabelArray(ProgramAddrMapFile, ProgramLabelMapFile, ProgramAddrLabelArray);
+
+        ////////////////////////////////////////////////////////////////////////////////
+        // Verify the test ran correctly by checking the memory against a known signature.
+        ////////////////////////////////////////////////////////////////////////////////
+        // if(TestBenchReset) test = 1;
+        if (TEST == "coremark")
+          if (dut.core.priv.priv.EcallFaultM) begin
+            $display("Benchmark: coremark is done.");
+            $stop;
+          end
+        if(Validate) begin
+          if (TEST == "embench") begin
+            // Writes contents of begin_signature to .sim.output file
+            // this contains instret and cycles for start and end of test run, used by embench 
+            // python speed script to calculate embench speed score. 
+            // also, begin_signature contains the results of the self checking mechanism, 
+            // which will be read by the python script for error checking
+            $display("Embench Benchmark: %s is done.", tests[test]);
+            if (riscofTest) outputfile = {pathname, tests[test], "/ref/ref.sim.output"};
+            else outputfile = {pathname, tests[test], ".sim.output"};
+            outputFilePointer = $fopen(outputfile, "w");
+            i = 0;
+            testadr = ($unsigned(begin_signature_addr))/(P.XLEN/8);
+            while ($unsigned(i) < $unsigned(5'd5)) begin
+              $fdisplayh(outputFilePointer, DCacheFlushFSM.ShadowRAM[testadr+i]);
+              i = i + 1;
+            end
+            $fclose(outputFilePointer);
+            $display("Embench Benchmark: created output file: %s", outputfile);
+          end else if (TEST == "coverage64gc") begin
+            $display("Coverage tests don't get checked");
+          end else begin 
+            // for tests with no self checking mechanism, read .signature.output file and compare to check for errors
+            // clear signature to prevent contamination from previous tests
+            if (!begin_signature_addr)
+              $display("begin_signature addr not found in %s", ProgramLabelMapFile);
+            else if (TEST != "embench") begin   // *** quick hack for embench.  need a better long term solution
+              CheckSignature(pathname, tests[test], riscofTest, begin_signature_addr, errors);
+              if(errors > 0) totalerrors = totalerrors + 1;
+            end
+          end
+          test = test + 1; // *** this probably needs to be moved.
+          @(posedge clk);
+        end
+      end
+    end
+    if (test == tests.size()) begin
+        if (totalerrors == 0) $display("SUCCESS! All tests ran without failures.");
+        else $display("FAIL: %d test programs had errors", totalerrors);
+`ifdef VERILATOR
+        $finish;
+`else
+        $stop; // if this is changed to $finish, wally-batch.do does not go to the next step to run coverage
+`endif
+      end
   end // initial begin

   // Model the testbench as an fsm.
@@ -235,8 +319,8 @@ module testbench;
   end

   always_ff @(posedge clk)
-    if (TestBenchReset) CurrState <= #1 STATE_TESTBENCH_RESET;
-    else CurrState <= #1 NextState;  
+    if (TestBenchReset) CurrState <= STATE_TESTBENCH_RESET;
+    else CurrState <= NextState;  

   // fsm next state logic
   always_comb begin
@@ -289,79 +373,6 @@ module testbench;
   assign begin_signature_addr = ProgramAddrLabelArray["begin_signature"];
   assign end_signature_addr = ProgramAddrLabelArray["sig_end_canary"];
   assign signature_size = end_signature_addr - begin_signature_addr;
-  always @(posedge clk) begin
-    if(SelectTest) begin
-      if (riscofTest) memfilename = {pathname, tests[test], "/ref/ref.elf.memfile"};
-      else if(TEST == "buildroot") begin 
-        memfilename = {RISCV_DIR, "/linux-testvectors/ram.bin"};
-        bootmemfilename = {RISCV_DIR, "/linux-testvectors/bootmem.bin"};
-      end
-      else            memfilename = {pathname, tests[test], ".elf.memfile"};
-      if (riscofTest) begin
-        ProgramAddrMapFile = {pathname, tests[test], "/ref/ref.elf.objdump.addr"};
-        ProgramLabelMapFile = {pathname, tests[test], "/ref/ref.elf.objdump.lab"};
-      end else if (TEST == "buildroot") begin
-        ProgramAddrMapFile = {RISCV_DIR, "/buildroot/output/images/disassembly/vmlinux.objdump.addr"};
-        ProgramLabelMapFile = {RISCV_DIR, "/buildroot/output/images/disassembly/vmlinux.objdump.lab"};
-      end else begin
-        ProgramAddrMapFile = {pathname, tests[test], ".elf.objdump.addr"};
-        ProgramLabelMapFile = {pathname, tests[test], ".elf.objdump.lab"};
-      end
-      // declare memory labels that interest us, the updateProgramAddrLabelArray task will find 
-      // the addr of each label and fill the array. To expand, add more elements to this array 
-      // and initialize them to zero (also initilaize them to zero at the start of the next test)
-      updateProgramAddrLabelArray(ProgramAddrMapFile, ProgramLabelMapFile, ProgramAddrLabelArray);
-    end
-    
-  ////////////////////////////////////////////////////////////////////////////////
-  // Verify the test ran correctly by checking the memory against a known signature.
-  ////////////////////////////////////////////////////////////////////////////////
-    if(TestBenchReset) test = 1;
-    if (TEST == "coremark")
-      if (dut.core.priv.priv.EcallFaultM) begin
-        $display("Benchmark: coremark is done.");
-        $stop;
-      end
-    if(Validate) begin
-      if (TEST == "embench") begin
-        // Writes contents of begin_signature to .sim.output file
-        // this contains instret and cycles for start and end of test run, used by embench 
-        // python speed script to calculate embench speed score. 
-        // also, begin_signature contains the results of the self checking mechanism, 
-        // which will be read by the python script for error checking
-        $display("Embench Benchmark: %s is done.", tests[test]);
-        if (riscofTest) outputfile = {pathname, tests[test], "/ref/ref.sim.output"};
-        else outputfile = {pathname, tests[test], ".sim.output"};
-        outputFilePointer = $fopen(outputfile, "w");
-        i = 0;
-        testadr = ($unsigned(begin_signature_addr))/(P.XLEN/8);
-        while ($unsigned(i) < $unsigned(5'd5)) begin
-          $fdisplayh(outputFilePointer, DCacheFlushFSM.ShadowRAM[testadr+i]);
-          i = i + 1;
-        end
-        $fclose(outputFilePointer);
-        $display("Embench Benchmark: created output file: %s", outputfile);
-      end else if (TEST == "coverage64gc") begin
-        $display("Coverage tests don't get checked");
-      end else begin 
-        // for tests with no self checking mechanism, read .signature.output file and compare to check for errors
-        // clear signature to prevent contamination from previous tests
-        if (!begin_signature_addr)
-          $display("begin_signature addr not found in %s", ProgramLabelMapFile);
-        else if (TEST != "embench") begin   // *** quick hack for embench.  need a better long term solution
-          CheckSignature(pathname, tests[test], riscofTest, begin_signature_addr, errors);
-          if(errors > 0) totalerrors = totalerrors + 1;
-        end
-      end
-      test = test + 1; // *** this probably needs to be moved.
-      if (test == tests.size()) begin
-        if (totalerrors == 0) $display("SUCCESS! All tests ran without failures.");
-        else $display("FAIL: %d test programs had errors", totalerrors);
-        $stop; // if this is changed to $finish, wally-batch.do does not go to the next step to run coverage
-      end
-    end
-  end
-

   ////////////////////////////////////////////////////////////////////////////////
   // load memories with program image
davidharrishmc commented 8 months ago

It would be fine to run just the riscv-arch-tests on Verilator. Mike Thompson said Verilator is not yet ready to handle UVM functional coverage. So if riscv-arch-test could run with --no-timing, that would still be useful.

David

On Mar 8, 2024, at 7:11 AM, keepmeanon @.***> wrote:

Try Verilator with --no-timing. Got problems with the test bench generating clock and reset. How does Verilator recommend doing this?

Actually, there can be no UVM with --no-timing.

Verilator added --timing support so that they can support UVM.

— Reply to this email directly, view it on GitHub https://github.com/openhwgroup/cvw/issues/650#issuecomment-1985867004, or unsubscribe https://github.com/notifications/unsubscribe-auth/AR4AA357WXRNNM3A57XTIFTYXHILXAVCNFSM6AAAAABEJMTPICVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBVHA3DOMBQGQ. You are receiving this because you authored the thread.

Karl-Han commented 8 months ago

I think I have got a tentative solution to this problem, but I need sometime to look into the code for optimization. Refer to https://github.com/Karl-Han/cvw/tree/verilator . And it has passed the regression.

The runtime of verilator is comparable to questasim now.

davidharrishmc commented 7 months ago

Solved with PR704

davidharrishmc commented 7 months ago

Verilator runtime on arch64i dropped from 440 seconds to 4.0 seconds. Nice work @Karl-Han