This reads inputs, such as __debug_req_valid, that may depend on its own outputs, such as __debug_resp_valid. This makes the circuit behavior undefined.
There are existing code lines that attempt to solve this by introducing delays, for example:
assign #0.1 debug_req_valid = __debug_req_valid;
but the delay specifier ("#0.1") has no effect under Verilator.
One way to make this race condition visible is to add a diagnostic printout to debug.scala:
diff --git a/src/common/debug.scala b/src/common/debug.scala
index ead55bc..b21d391 100644
--- a/src/common/debug.scala
+++ b/src/common/debug.scala
@@ -169,6 +169,32 @@ class DebugModule(implicit val conf: SodorConfiguration) extends Module {
dmstatus.allhalted := dmcontrol.haltreq
dmstatus.allrunning := dmcontrol.resumereq
io.dcpath.halt := dmstatus.allhalted && !dmstatus.allrunning
+
+ // Stronger condition.
+ val outercond =
+ ( (io.dmi.req.bits.op === DMConsts.dmi_OP_WRITE) &&
+ (decoded_addr(DMI_RegAddrs.DMI_COMMAND)) &&
+ io.dmi.req.valid );
+
+ // Weaker condition. I call this "inner" because its two elements are
+ // syntactically contained inside the "outer" condition.
+ val innercond =
+ ( (decoded_addr(DMI_RegAddrs.DMI_COMMAND)) &&
+ io.dmi.req.valid );
+
+ // This creates an extra use of 'innercond' that changes the way
+ // Verilator generates code. It is needed to expose the bug. The
+ // value also has to be used somewhere, which is why it appears in
+ // the "bug happened" printout below.
+ val anothercond = innercond && dmcontrol.haltreq;
+
+ // Test for the impossible combination.
+ when (outercond) {
+ when (!innercond) {
+ printf("bug happened! anothercond=%d\n", anothercond);
+ }
+ }
+
when (io.dmi.req.bits.op === DMConsts.dmi_OP_WRITE){
when((decoded_addr(DMI_RegAddrs.DMI_ABSTRACTCS)) && io.dmi.req.valid) {
val tempabstractcs = wdata.asTypeOf(new ABSTRACTCSFields())
With the above change, we see printouts indicating the impossible condition happens:
The problem now is that the program does not get loaded correctly. The first instruction should be 0x04c0006f, not 0x34202f73, which is the second instruction. I tracked this down to riscv-isa-sim/fesvr/dtm.cc, write_chunk, which has this line:
write(DMI_SBADDRESS0, taddr-4);
The problem is the mysterious -4. There is nothing in the RISCV debug specification to suggest that should be needed (on the contrary, it should not), and there isn't anything similar in read_chunk nor in the non-Sodor implementation of write_chunk. I speculate the -4 was added because it compensated for an effect of the race condition I'm describing. If I remove it, and also fix the race as described above, then programs work again.
Unfortunately, even then not all of the tests pass. The first to fail is rv32_3stage, rv32ui-p-xori:
emulator/rv32_3stage$ ./emulator +max-cycles=30000 ../../install/riscv-tests/rv32ui-p-xori
[...]
Cyc= 280 [1] pc=[80000048] W[r 0=8000004c][1] Op1=[r31][0000000b] Op2=[r25][00000000] inst=[ff9ff06f] DASM(ff9ff06f)
terminate called after throwing an instance of 'std::runtime_error'
what(): bad syscall #342274048501219439
Exit 255
I have not attempted to debug that failure yet. I figured I'd file what I have and get some feedback before going further.
vsrc/SimDTM.v
contains this call todebug_tick
:This reads inputs, such as
__debug_req_valid
, that may depend on its own outputs, such as__debug_resp_valid
. This makes the circuit behavior undefined.There are existing code lines that attempt to solve this by introducing delays, for example:
but the delay specifier ("#0.1") has no effect under Verilator.
One way to make this race condition visible is to add a diagnostic printout to
debug.scala
:With the above change, we see printouts indicating the impossible condition happens:
At first I thought this was a bug in Verilator and filed https://github.com/verilator/verilator/issues/2177 for it, but was informed about the race condition in
SimDTM.v
, so I'm reporting it here.To fix the bug, I propose (following the advice in the linked issue):
That change eliminates the observed misbehavior in
debug.scala
. However, tests then fail. For example:The problem now is that the program does not get loaded correctly. The first instruction should be
0x04c0006f
, not0x34202f73
, which is the second instruction. I tracked this down toriscv-isa-sim/fesvr/dtm.cc
,write_chunk
, which has this line:The problem is the mysterious
-4
. There is nothing in the RISCV debug specification to suggest that should be needed (on the contrary, it should not), and there isn't anything similar inread_chunk
nor in the non-Sodor implementation ofwrite_chunk
. I speculate the-4
was added because it compensated for an effect of the race condition I'm describing. If I remove it, and also fix the race as described above, then programs work again.Unfortunately, even then not all of the tests pass. The first to fail is
rv32_3stage
,rv32ui-p-xori
:I have not attempted to debug that failure yet. I figured I'd file what I have and get some feedback before going further.