Closed ghost closed 7 years ago
what is the error message? are you talking about the rtl-simulator or the instruction set simulator?
On 05/17/2017 02:37 AM, origintfj wrote:
This program: ` void test(void);
int main(void) { test();
return 0; }
include "256Ki_nops" // 262144 32-bit nops
void test(void) { } |Crashes the simulator due to the consecutive execution of these two instructions;| 103b8: 00100097 auipc ra,0x100 103bc: 020080e7 jalr 32(ra) # 1103d8 ` which call the test function.
The compiler generates this sequence of instructions in order to perform a long branch (~> 1MiB).
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pulp-platform/riscv/issues/11, or mute the thread https://github.com/notifications/unsubscribe-auth/ARsRncRkmNmJ221RbZtlEu-EXVrenbmFks5r6kFOgaJpZM4NdPaD.
Sorry, I should have been clearer. This program causes the RTL simulator to crash. Unfortunately, it crashes in such a way that makes it very difficult to root cause the issue. What I do know is that when it does, time stops advancing (classic comb. loop symptom), and I can't break out to the simulator prompt or retrieve any profiling data.
Try running it yourself and see what happens.
How big is your program? Because if you add so many nops, there is no way it will fit in the sram. it might be that you have an overflow somewhere.
On 05/17/2017 06:35 PM, origintfj wrote:
Try running it yourself and see what happens.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pulp-platform/riscv/issues/11#issuecomment-302147396, or mute the thread https://github.com/notifications/unsubscribe-auth/ARsRnXzOonhkhEwCOy2TSocJLgQWj6pBks5r6yHdgaJpZM4NdPaD.
No that's not the problem. You can boil it down to those two instructions. For some reason, executing the AUIPC followed immediately by the JALR exposes this issue. If you insert a NOP between the two the problem goes away.
Initially I was just testing the core's ability to consecutively execute those two instructions. When I discovered the failure I then wrote a C program (the one in shown above) to demonstrate that the failing instruction sequence could be produced by the compiler e.g. in the case of a large program.
My suspicion, although purely speculative thus far, is that the operand forwarding path in the case where the PC is concerned, opens up a combinatorial loop in this situation.
You don't need the nops, -Wl,--no-relax
will force auipc+jalr for all function calls
Ok that's good to know. Have you been able to reproduce the failure?
Which RTL simulator are you using? As far as I know, ModelSim never truly hangs and you can always take a look at the waveforms, where you would see the wavesforms in the core that cause this behavior (in the delta cycles).
I suspect you are using ncsim? If yes, you may want to take a look at https://github.com/pulp-platform/pulpino/issues/30
@atraber I am using the Cadence simulator, thanks for pointing out the existence of that flag. Adding it did indeed solve the issue as @edcarstens reported.
If you're interested to know what the root cause of the issue is, I've tracked it down to the stall control always_comb block in controller.sv.
Splitting this always_comb:
always_comb
begin
load_stall_o = 1'b0;
jr_stall_o = 1'b0;
deassert_we_o = 1'b0;
// deassert WE when the core is not decoding instructions
if (~is_decoding_o)
deassert_we_o = 1'b1;
// deassert WE in case of illegal instruction
if (illegal_insn_i)
deassert_we_o = 1'b1;
// Stall because of load operation
if ((data_req_ex_i == 1'b1) && (regfile_we_ex_i == 1'b1) &&
((reg_d_ex_is_reg_a_i == 1'b1) || (reg_d_ex_is_reg_b_i == 1'b1) || (reg_d_ex_is_reg_c_i == 1'b1)) )
begin
deassert_we_o = 1'b1;
load_stall_o = 1'b1;
end
// Stall because of jr path
// - always stall if a result is to be forwarded to the PC
// we don't care about in which state the ctrl_fsm is as we deassert_we
// anyway when we are not in DECODE
if ((jump_in_dec_i == BRANCH_JALR) &&
(((regfile_we_wb_i == 1'b1) && (reg_d_wb_is_reg_a_i == 1'b1)) ||
((regfile_we_ex_i == 1'b1) && (reg_d_ex_is_reg_a_i == 1'b1)) ||
((regfile_alu_we_fw_i == 1'b1) && (reg_d_alu_is_reg_a_i == 1'b1))) )
begin
jr_stall_o = 1'b1;
deassert_we_o = 1'b1;
end
end
Into these two:
always_comb
begin
load_stall_o = 1'b0;
deassert_we_o = 1'b0;
// deassert WE when the core is not decoding instructions
if (~is_decoding_o)
deassert_we_o = 1'b1;
// deassert WE in case of illegal instruction
if (illegal_insn_i)
deassert_we_o = 1'b1;
// Stall because of load operation
if ((data_req_ex_i == 1'b1) && (regfile_we_ex_i == 1'b1) &&
((reg_d_ex_is_reg_a_i == 1'b1) || (reg_d_ex_is_reg_b_i == 1'b1) || (reg_d_ex_is_reg_c_i == 1'b1)) )
begin
deassert_we_o = 1'b1;
load_stall_o = 1'b1;
end
// Stall because of jr path
// - always stall if a result is to be forwarded to the PC
// we don't care about in which state the ctrl_fsm is as we deassert_we
// anyway when we are not in DECODE
if ((jump_in_dec_i == BRANCH_JALR) &&
(((regfile_we_wb_i == 1'b1) && (reg_d_wb_is_reg_a_i == 1'b1)) ||
((regfile_we_ex_i == 1'b1) && (reg_d_ex_is_reg_a_i == 1'b1)) ||
((regfile_alu_we_fw_i == 1'b1) && (reg_d_alu_is_reg_a_i == 1'b1))) )
begin
deassert_we_o = 1'b1;
end
end
always_comb
begin
jr_stall_o = 1'b0;
// Stall because of jr path
// - always stall if a result is to be forwarded to the PC
// we don't care about in which state the ctrl_fsm is as we deassert_we
// anyway when we are not in DECODE
if ((jump_in_dec_i == BRANCH_JALR) &&
(((regfile_we_wb_i == 1'b1) && (reg_d_wb_is_reg_a_i == 1'b1)) ||
((regfile_we_ex_i == 1'b1) && (reg_d_ex_is_reg_a_i == 1'b1)) ||
((regfile_alu_we_fw_i == 1'b1) && (reg_d_alu_is_reg_a_i == 1'b1))) )
begin
jr_stall_o = 1'b1;
end
end
allows simulation to run without hanging without having to apply the '-delay_trigger' option. The reason for this is as follows:
The SV module shown below causes the Cadence simulator to spin in an infinite loop, even though there isn't actually a combinational loop in the logic.
//`define DONT_FAIL
module test
(
);
// proc 1
logic a;
logic c;
// proc 2
logic b;
// proc 1
//
always_comb
begin
// unrelated to the below
a = 1'b0;
a = 1'b1;
`ifdef DONT_FAIL
end
// split into two seperate processes
always_comb
begin
`endif
// unrelated to the above
c = b;
end
// proc 2
//
always_comb
begin
b = 1'b0;
b = a;
end
endmodule
The reason for the spin is as follows:
The two assignments to each of 'a' and 'b' are crucial in exposing this behaviour. If at any time during the execution of the always block the value of an assigned variable changes, the variable will be added to the event queue, even if the execution of the always block as a whole does not result in the value of the variable changing.
Assigning default values to signals at the top of an always_comb process and reassigning them later on is standard practice for me when implementing next-state-logic, so the only other way I can see to avoid this through code changes would be to split up the proc1 always_comb block into two, one for the 'a' assignments, and one for the 'b' assignments, and while this may seem an obvious choice in this case, it's unclear to me that it will always be obvious.
The '-delay_trigger' tells the simulator to wait until it's evaluated the entire always block before deciding on whether or not there has been an event on any particular variable, and thus whether or not it should be added to the event queue. Setting this flag prevents the simulator from hanging in this case because variables which don't change after a complete pass of the always block are not added to the event queue.
thanks a lot for this analysis! very interesting indeed.
This program:
Crashes the simulator due to the consecutive execution of these two instructions;
which call the test function.
The compiler generates this sequence of instructions in order to perform a long branch (~> 1MiB).