Closed martin-velay closed 1 month ago
otbn_rf_base_intg_err - back to 100% otbn_ctrl_redun - 91.7%
I suspect that these are different issues.
Firstly, let's consider the otbn_rf_base_intg_err
message. The test tries to inject an error by one or two bit error into a register read. To do this, it waits until a cycle when an instruction is trying to read from the given side of the register file, and then uses the force
statement to bodge the value.
The message is saying that we've waited ages and haven't actually seen any instructions coming through that read from the register file. Here, "ages" is defined as a time in clock cycles (currently 20,000). My guess is that we're actually blocked, waiting for a seed from the EDN, so aren't running any instructions at all in that time period.
There's a reasonably obvious solution: tweak the vseq so that it times out when the operation finishes. If the device completely hangs then the test will (eventually) time out from a phase timeout. I've just pushed #23582 which should implement this.
For the second test, it looks like we sometimes fail to find a "good time" and a re-run can perturb things so that we do. A local run doing so gets the failure rate from roughly 4/50 to 1/50 (and failing with a different behaviour), so I think this triage issue should be solved by that change: #23583.
Removing the M4 milestone association. I think this will be fixed by the two PRs mentioned above, and am also certain that the issue is unrelated to the M4 exit criteria.
Thanks Rupert!
The first vseq passed at 100% over the last 9 nights. The second test passes at a reasonable rate and the sporadic failures that I see when running locally aren't the same sort of timeout as described above.
Closing this issue because I think the problem that it describes is fixed.
Hierarchy of regression failure
Block level
Failure Description
otbn_rf_base_intg_err with seed: 105816672124054756254243263758896087906517619155141568323899745862435846495702
UVM_FATAL @ 931274490 ps: (otbn_rf_base_intg_err_vseq.sv:32) uvm_test_top.env.virtual_sequencer [uvm_test_top.env.virtual_sequencer.otbn_rf_base_intg_err_vseq] Timeout while waiting for register file A to be used
otbn_ctrl_redun with seed: 42265330512524754647320219056712357229922074576056242959858639090007540897643
UVM_FATAL @ 69456697 ps: (otbn_ctrl_redun_vseq.sv:31) uvm_test_top.env.virtual_sequencer [uvm_test_top.env.virtual_sequencer.otbn_ctrl_redun_vseq] Never found a time to inject an error.
Steps to Reproduce
util/dvsim/dvsim.py hw/ip/otbn/dv/uvm/otbn_sim_cfg.hjson -i otbn_ctrl_redun -t xcelium --fixed-seed 42265330512524754647320219056712357229922074576056242959858639090007540897643