openhwgroup / cv32e40p

CV32E40P is an in-order 4-stage RISC-V RV32IMFCXpulp CPU based on RI5CY from PULP-Platform
https://docs.openhwgroup.org/projects/cv32e40p-user-manual/en/latest
Other
900 stars 399 forks source link

Large program crashes simulator - probable comb. loop within processor? #11

Closed ghost closed 7 years ago

ghost commented 7 years ago

This program:

void test(void);

int main(void) {
    test();

    return 0;
}

#include "256Ki_nops" // 262144 32-bit nops

void test(void) {
}

Crashes the simulator due to the consecutive execution of these two instructions;

   103b8:       00100097                auipc   ra,0x100
   103bc:       020080e7                jalr    32(ra) # 1103d8 <test>

which call the test function.

The compiler generates this sequence of instructions in order to perform a long branch (~> 1MiB).

gautschimi commented 7 years ago

what is the error message? are you talking about the rtl-simulator or the instruction set simulator?

On 05/17/2017 02:37 AM, origintfj wrote:

This program: ` void test(void);

int main(void) { test();

return 0;

}

include "256Ki_nops" // 262144 32-bit nops

void test(void) { } |Crashes the simulator due to the consecutive execution of these two instructions;| 103b8: 00100097 auipc ra,0x100 103bc: 020080e7 jalr 32(ra) # 1103d8 ` which call the test function.

The compiler generates this sequence of instructions in order to perform a long branch (~> 1MiB).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pulp-platform/riscv/issues/11, or mute the thread https://github.com/notifications/unsubscribe-auth/ARsRncRkmNmJ221RbZtlEu-EXVrenbmFks5r6kFOgaJpZM4NdPaD.

--

Michael Gautschi phone: +41 44 632 99 58 Integrated Systems Laboratory fax: +41 44 632 11 94 ETZ J69.2 e-mail: gautschi@iis.ee.ethz.ch Gloriastr. 35 ETH Zentrum CH-8092 Zurich

ghost commented 7 years ago

Sorry, I should have been clearer. This program causes the RTL simulator to crash. Unfortunately, it crashes in such a way that makes it very difficult to root cause the issue. What I do know is that when it does, time stops advancing (classic comb. loop symptom), and I can't break out to the simulator prompt or retrieve any profiling data.

ghost commented 7 years ago

Try running it yourself and see what happens.

gautschimi commented 7 years ago

How big is your program? Because if you add so many nops, there is no way it will fit in the sram. it might be that you have an overflow somewhere.

On 05/17/2017 06:35 PM, origintfj wrote:

Try running it yourself and see what happens.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pulp-platform/riscv/issues/11#issuecomment-302147396, or mute the thread https://github.com/notifications/unsubscribe-auth/ARsRnXzOonhkhEwCOy2TSocJLgQWj6pBks5r6yHdgaJpZM4NdPaD.

--

Michael Gautschi phone: +41 44 632 99 58 Integrated Systems Laboratory fax: +41 44 632 11 94 ETZ J69.2 e-mail: gautschi@iis.ee.ethz.ch Gloriastr. 35 ETH Zentrum CH-8092 Zurich

ghost commented 7 years ago

No that's not the problem. You can boil it down to those two instructions. For some reason, executing the AUIPC followed immediately by the JALR exposes this issue. If you insert a NOP between the two the problem goes away.

Initially I was just testing the core's ability to consecutively execute those two instructions. When I discovered the failure I then wrote a C program (the one in shown above) to demonstrate that the failing instruction sequence could be produced by the compiler e.g. in the case of a large program.

My suspicion, although purely speculative thus far, is that the operand forwarding path in the case where the PC is concerned, opens up a combinatorial loop in this situation.

sorear commented 7 years ago

You don't need the nops, -Wl,--no-relax will force auipc+jalr for all function calls

ghost commented 7 years ago

Ok that's good to know. Have you been able to reproduce the failure?

atraber commented 7 years ago

Which RTL simulator are you using? As far as I know, ModelSim never truly hangs and you can always take a look at the waveforms, where you would see the wavesforms in the core that cause this behavior (in the delta cycles).

I suspect you are using ncsim? If yes, you may want to take a look at https://github.com/pulp-platform/pulpino/issues/30

ghost commented 7 years ago

@atraber I am using the Cadence simulator, thanks for pointing out the existence of that flag. Adding it did indeed solve the issue as @edcarstens reported.

If you're interested to know what the root cause of the issue is, I've tracked it down to the stall control always_comb block in controller.sv.

Splitting this always_comb:

  always_comb
  begin
    load_stall_o   = 1'b0;
    jr_stall_o     = 1'b0;
    deassert_we_o  = 1'b0;

    // deassert WE when the core is not decoding instructions
    if (~is_decoding_o)
      deassert_we_o = 1'b1;

    // deassert WE in case of illegal instruction
    if (illegal_insn_i)
      deassert_we_o = 1'b1;

    // Stall because of load operation
    if ((data_req_ex_i == 1'b1) && (regfile_we_ex_i == 1'b1) &&
        ((reg_d_ex_is_reg_a_i == 1'b1) || (reg_d_ex_is_reg_b_i == 1'b1) || (reg_d_ex_is_reg_c_i == 1'b1)) )
    begin
      deassert_we_o   = 1'b1;
      load_stall_o    = 1'b1;
    end

    // Stall because of jr path
    // - always stall if a result is to be forwarded to the PC
    // we don't care about in which state the ctrl_fsm is as we deassert_we
    // anyway when we are not in DECODE
    if ((jump_in_dec_i == BRANCH_JALR) &&
        (((regfile_we_wb_i == 1'b1) && (reg_d_wb_is_reg_a_i == 1'b1)) ||
         ((regfile_we_ex_i == 1'b1) && (reg_d_ex_is_reg_a_i == 1'b1)) ||
         ((regfile_alu_we_fw_i == 1'b1) && (reg_d_alu_is_reg_a_i == 1'b1))) )
    begin
      jr_stall_o      = 1'b1;
      deassert_we_o     = 1'b1;
    end
  end

Into these two:

  always_comb
  begin
    load_stall_o   = 1'b0;
    deassert_we_o  = 1'b0;

    // deassert WE when the core is not decoding instructions
    if (~is_decoding_o)
      deassert_we_o = 1'b1;

    // deassert WE in case of illegal instruction
    if (illegal_insn_i)
      deassert_we_o = 1'b1;

    // Stall because of load operation
    if ((data_req_ex_i == 1'b1) && (regfile_we_ex_i == 1'b1) &&
        ((reg_d_ex_is_reg_a_i == 1'b1) || (reg_d_ex_is_reg_b_i == 1'b1) || (reg_d_ex_is_reg_c_i == 1'b1)) )
    begin
      deassert_we_o   = 1'b1;
      load_stall_o    = 1'b1;
    end

    // Stall because of jr path
    // - always stall if a result is to be forwarded to the PC
    // we don't care about in which state the ctrl_fsm is as we deassert_we
    // anyway when we are not in DECODE
    if ((jump_in_dec_i == BRANCH_JALR) &&
        (((regfile_we_wb_i == 1'b1) && (reg_d_wb_is_reg_a_i == 1'b1)) ||
         ((regfile_we_ex_i == 1'b1) && (reg_d_ex_is_reg_a_i == 1'b1)) ||
         ((regfile_alu_we_fw_i == 1'b1) && (reg_d_alu_is_reg_a_i == 1'b1))) )
    begin
      deassert_we_o     = 1'b1;
    end
  end

  always_comb
  begin
    jr_stall_o     = 1'b0;
    // Stall because of jr path
    // - always stall if a result is to be forwarded to the PC
    // we don't care about in which state the ctrl_fsm is as we deassert_we
    // anyway when we are not in DECODE
    if ((jump_in_dec_i == BRANCH_JALR) &&
        (((regfile_we_wb_i == 1'b1) && (reg_d_wb_is_reg_a_i == 1'b1)) ||
         ((regfile_we_ex_i == 1'b1) && (reg_d_ex_is_reg_a_i == 1'b1)) ||
         ((regfile_alu_we_fw_i == 1'b1) && (reg_d_alu_is_reg_a_i == 1'b1))) )
    begin
      jr_stall_o      = 1'b1;
    end
  end

allows simulation to run without hanging without having to apply the '-delay_trigger' option. The reason for this is as follows:

The SV module shown below causes the Cadence simulator to spin in an infinite loop, even though there isn't actually a combinational loop in the logic.

//`define DONT_FAIL

module test
    (
    );

    // proc 1
    logic a;
    logic c;
    // proc 2
    logic b;

    // proc 1
    //
    always_comb
    begin
        // unrelated to the below
        a = 1'b0;
        a = 1'b1;
`ifdef DONT_FAIL
    end
    // split into two seperate processes
    always_comb
    begin
`endif
        // unrelated to the above
        c = b;
    end

    // proc 2
    //
    always_comb
    begin
        b = 1'b0;
        b = a;
    end
endmodule

The reason for the spin is as follows:

  1. proc1 is evaluated by the simulator and 'a' is assigned, adding it to the event queue.
  2. proc2 is evaluated and 'b' is assigned, adding it to the event queue.
  3. proc1 is evaluated as a result of the event on 'b' and 'a' is assigned, adding it to the event queue.
  4. proc2 is evaluated as a result of the event on 'a' and 'b' is assigned, adding it to the event queue.
  5. GOTO (3) // now we're sitting in an infinite loop.

The two assignments to each of 'a' and 'b' are crucial in exposing this behaviour. If at any time during the execution of the always block the value of an assigned variable changes, the variable will be added to the event queue, even if the execution of the always block as a whole does not result in the value of the variable changing.

Assigning default values to signals at the top of an always_comb process and reassigning them later on is standard practice for me when implementing next-state-logic, so the only other way I can see to avoid this through code changes would be to split up the proc1 always_comb block into two, one for the 'a' assignments, and one for the 'b' assignments, and while this may seem an obvious choice in this case, it's unclear to me that it will always be obvious.

The '-delay_trigger' tells the simulator to wait until it's evaluated the entire always block before deciding on whether or not there has been an event on any particular variable, and thus whether or not it should be added to the event queue. Setting this flag prevents the simulator from hanging in this case because variables which don't change after a complete pass of the always block are not added to the event queue.

gautschimi commented 7 years ago

thanks a lot for this analysis! very interesting indeed.