Open elliotb-lowrisc opened 1 month ago
I did some triage on this failure:
This test has a pass rate of approx 40% since moving to block to V2 in https://github.com/lowRISC/opentitan/pull/24011. Even with this failure, we remain at the requisite levels for V2.
There are number of different failure modes :
Job i2c-sim-vcs_run_default killed due to: Exit reason: User job exceeded runlimit: User job timed out
UVM_ERROR @ 27145771832 ps: (i2c_scoreboard.sv:716) [uvm_test_top.env.scoreboard] controller_mode_rd_obs_fifo item uncompared:
UVM_ERROR @ 117789037105 ps: (i2c_scoreboard.sv:717) [uvm_test_top.env.scoreboard] controller_mode_wr_obs_fifo item uncompared:
UVM_ERROR @ 16036870305 ps: (cip_base_vseq.sv:839) [uvm_test_top.env.virtual_sequencer.i2c_common_vseq] Check failed (!has_outstanding_access()) Waited 10000 cycles to issue a reset with no outstanding accesses.
UVM_FATAL @ 600000000000 ps: (uvm_phase.svh:1512) [PH_TIMEOUT] Explicit timeout of 600000000000 ps hit, indicating a probable testbench issue
UVM_ERROR @ 21766660728 ps: (i2c_host_fifo_watermark_vseq.sv:60) [uvm_test_top.env.virtual_sequencer.i2c_host_fifo_watermark_vseq] Check failed cnt_fmt_threshold <= 3 (4 [0x4] vs 3 [0x3])
The architecture of stress_all
tests tend to exacerbate weaknesses in DV infrastructure, such as stimulus vseq's leaving the DUT or checking infrastructure in a state which does not match the assumptions about starting state for the following vseq.
None of these failure modes are immediately concerning to me, and I would suggest that fixing these issues is not an immediate priority. There is still outstanding I2C DV work for extension protocol features / multi-controller, and this will add more tests into the stress_all rotation, as well as likely change wider parts of the DV infrastructure. Hence I'm going to put this item into M7 for now.
Thanks @hcallahan-lowrisc for the feedback. This sounds good to me! Based on your assessment I've given this a P3 priority.
This issue has been raised again during the regression on call meeting. I would like to mention that this test I2c_host_mode_toggle
is also concerned by the same issue.
Hierarchy of regression failure
Block level
Failure Description
Multiple failure sources in the latest regression, but this was the most common:
Steps to Reproduce
./util/dvsim/dvsim.py hw/ip/i2c/dv/i2c_sim_cfg.hjson -i i2c_host_stress_all --build-seed 46057207235241274571178436692064798722168129065126426307050395083305588858879 --fixed-seed 111351861029607319840208042477550323195458624262667788092371830570225893891502
Tests with similar or related failures
54060242010632769836666320237261266906268616954605287882421647600191088640353