I've done a triage pass of failures seen in the OTBN regression when CDC instrumentation is enabled. To do this I compared the failures with the instrumentation on against the failures from the latest nightly (which has instrumentation off) and looked at those signatures that were only seen in tests with the instrumentation on. I also looked at tests that had significantly higher failure rates with instrumentation on but were failing with signatures seen in those tests with instrumentation off.
A full nightly was run against commit d941d7e8b7d25a30b41c714c4c448c87a2853551 modified to enable CDC instrumentation for OTBN by modifying:
Other than #19044 the failures here can be addressed in V3.
Full details of failures triaged and conclusions are below:
New signatures from CDC enablement:
UVM_FATAL (otbn_model_if.sv:77) [otbn_model_if] Check failed (u_model.otbn_model_step_crc(handle, item, crc_state) == *) Failed to update CRC has 2 failures:
Test otbn_escalate has 2 failures.
32.otbn_escalate.4215586503
Line 113, in log /home/greg/work/opentitan_area/opentitan/scratch/master/otbn-sim-xcelium/32.otbn_escalate/latest/run.log
UVM_FATAL @ 2373129 ps: (otbn_model_if.sv:77) [otbn_model_if] Check failed (u_model.otbn_model_step_crc(handle, item, crc_state) == 0) Failed to update CRC
UVM_INFO @ 2373129 ps: (uvm_report_catcher.svh:705) [UVM/REPORT/CATCHER]
--- UVM Report catcher Summary ---
36.otbn_escalate.2699923888
Line 113, in log /home/greg/work/opentitan_area/opentitan/scratch/master/otbn-sim-xcelium/36.otbn_escalate/latest/run.log
UVM_FATAL @ 2060914 ps: (otbn_model_if.sv:77) [otbn_model_if] Check failed (u_model.otbn_model_step_crc(handle, item, crc_state) == 0) Failed to update CRC
UVM_INFO @ 2060914 ps: (uvm_report_catcher.svh:705) [UVM/REPORT/CATCHER]
--- UVM Report catcher Summary ---
Model is predicting a 0 value for CRC. Uncertain why but RTL is behaving correctly (CRC updating as expected, no 0 values)
UVM_FATAL (otbn_base_vseq.sv:582) [otbn_sw_errs_fatal_chk_vseq] Check failed (!(cfg.loop_vif.loop_stack_push || cfg.loop_vif.loop_stack_pop) && !cfg.trace_vif.locking_o) has 1 failures:
Test otbn_sw_errs_fatal_chk has 1 failures.
0.otbn_sw_errs_fatal_chk.1801271138
Line 123, in log /home/greg/work/opentitan_area/opentitan/scratch/master/otbn-sim-xcelium/0.otbn_sw_errs_fatal_chk/latest/run.log
UVM_FATAL @ 88113688 ps: (otbn_base_vseq.sv:582) [uvm_test_top.env.virtual_sequencer.otbn_sw_errs_fatal_chk_vseq] Check failed (!(cfg.loop_vif.loop_stack_push || cfg.loop_vif.loop_stack_pop) && !cfg.trace_vif.locking_o)
UVM_INFO @ 88113688 ps: (uvm_report_catcher.svh:705) [UVM/REPORT/CATCHER]
--- UVM Report catcher Summary ---
Loop warping malfunction (attemping a warp on a cycle it can't), DV environment issue not an RTL bug
UVM_FATAL (otbn_base_vseq.sv:582) [otbn_sec_wipe_err_vseq] Check failed (!(cfg.loop_vif.loop_stack_push || cfg.loop_vif.loop_stack_pop) && !cfg.trace_vif.locking_o) has 1 failures:
Test otbn_sec_wipe_err has 1 failures.
1.otbn_sec_wipe_err.1382000636
Line 93, in log /home/greg/work/opentitan_area/opentitan/scratch/master/otbn-sim-xcelium/1.otbn_sec_wipe_err/latest/run.log
UVM_FATAL @ 25973747 ps: (otbn_base_vseq.sv:582) [uvm_test_top.env.virtual_sequencer.otbn_sec_wipe_err_vseq] Check failed (!(cfg.loop_vif.loop_stack_push || cfg.loop_vif.loop_stack_pop) && !cfg.trace_vif.locking_o)
UVM_INFO @ 25973747 ps: (uvm_report_catcher.svh:705) [UVM/REPORT/CATCHER]
--- UVM Report catcher Summary ---
Loop warping malfunction (attemping a warp on a cycle it can't), DV environment issue not an RTL bug
UVM_FATAL (otbn_base_vseq.sv:369) [otbn_urnd_err_vseq] Check failed (cfg.model_agent_cfg.vif.status == otbn_pkg::StatusIdle) Timed out waiting for OTBN to be idle before execution has 1 failures:
Test otbn_urnd_err has 1 failures.
1.otbn_urnd_err.2713703608
Line 100, in log /home/greg/work/opentitan_area/opentitan/scratch/master/otbn-sim-xcelium/1.otbn_urnd_err/latest/run.log
UVM_FATAL @ 17274033 ps: (otbn_base_vseq.sv:369) [uvm_test_top.env.virtual_sequencer.otbn_urnd_err_vseq] Check failed (cfg.model_agent_cfg.vif.status == otbn_pkg::StatusIdle) Timed out waiting for OTBN to be idle before execution
UVM_INFO @ 17274033 ps: (uvm_report_catcher.svh:705) [UVM/REPORT/CATCHER]
--- UVM Report catcher Summary ---
Timeout due to delay in receiving entropy from EDN. Entropy has been observed being requested and correctly received but it needs 8 words and only has 4 at the time of the timeout. The 4 words have successfully crossed the clock domain boundary so no CDC issues seen.
UVM_FATAL (otbn_stack_addr_integ_chk_vseq.sv:72) [otbn_stack_addr_integ_chk_vseq] timeout occurred! has 1 failures:
Test otbn_stack_addr_integ_chk has 1 failures.
1.otbn_stack_addr_integ_chk.1064076725
Line 93, in log /home/greg/work/opentitan_area/opentitan/scratch/master/otbn-sim-xcelium/1.otbn_stack_addr_integ_chk/latest/run.log
UVM_FATAL @ 10008294145 ps: (otbn_stack_addr_integ_chk_vseq.sv:72) [uvm_test_top.env.virtual_sequencer.otbn_stack_addr_integ_chk_vseq] timeout occurred!
UVM_INFO @ 10008294145 ps: (uvm_report_catcher.svh:705) [UVM/REPORT/CATCHER]
--- UVM Report catcher Summary ---
Waiting for a call stack pop to inject a fault that never happens, OTBN completes execution without triggering any model errors so just a DV environment issue.
UVM_FATAL (otbn_base_vseq.sv:369) [otbn_alu_bignum_mod_err_vseq] Check failed (cfg.model_agent_cfg.vif.status == otbn_pkg::StatusIdle) Timed out waiting for OTBN to be idle before execution has 1 failures:
Test otbn_alu_bignum_mod_err has 1 failures.
2.otbn_alu_bignum_mod_err.2866351614
Line 100, in log /home/greg/work/opentitan_area/opentitan/scratch/master/otbn-sim-xcelium/2.otbn_alu_bignum_mod_err/latest/run.log
UVM_FATAL @ 9988012 ps: (otbn_base_vseq.sv:369) [uvm_test_top.env.virtual_sequencer.otbn_alu_bignum_mod_err_vseq] Check failed (cfg.model_agent_cfg.vif.status == otbn_pkg::StatusIdle) Timed out waiting for OTBN to be idle before execution
UVM_INFO @ 9988012 ps: (uvm_report_catcher.svh:705) [UVM/REPORT/CATCHER]
--- UVM Report catcher Summary ---
Timeout waiting for secure wipe, timeout value too short (potentially increased due to CDC enablement making EDN requests take longer). Secure wipe running correctly at timeout
UVM_FATAL (otbn_stack_addr_integ_chk_vseq.sv:58) [otbn_stack_addr_integ_chk_vseq] timeout occurred! has 1 failures:
Test otbn_stack_addr_integ_chk has 1 failures.
3.otbn_stack_addr_integ_chk.182425094
Line 93, in log /home/greg/work/opentitan_area/opentitan/scratch/master/otbn-sim-xcelium/3.otbn_stack_addr_integ_chk/latest/run.log
UVM_FATAL @ 10014806449 ps: (otbn_stack_addr_integ_chk_vseq.sv:58) [uvm_test_top.env.virtual_sequencer.otbn_stack_addr_integ_chk_vseq] timeout occurred!
UVM_INFO @ 10014806449 ps: (uvm_report_catcher.svh:705) [UVM/REPORT/CATCHER]
--- UVM Report catcher Summary ---
Waiting for a valid item on top of the call stack to inject a fault that never happens, OTBN completes execution without triggering any model errors so just a DV environment issue.
UVM_FATAL (otbn_base_vseq.sv:348) [otbn_imem_err_vseq] Check failed (!cfg.under_reset) has 1 failures:
Test otbn_stress_all_with_rand_reset has 1 failures.
5.otbn_stress_all_with_rand_reset.1133880645
Line 135, in log /home/greg/work/opentitan_area/opentitan/scratch/master/otbn-sim-xcelium/5.otbn_stress_all_with_rand_reset/latest/run.log
UVM_FATAL @ 23098619 ps: (otbn_base_vseq.sv:348) [uvm_test_top.env.virtual_sequencer.otbn_imem_err_vseq] Check failed (!cfg.under_reset)
UVM_INFO @ 23098619 ps: (uvm_report_catcher.svh:705) [UVM/REPORT/CATCHER]
--- UVM Report catcher Summary ---
V3 test so no examined in detail
UVM_FATAL (otbn_base_vseq.sv:369) [otbn_rf_bignum_intg_err_vseq] Check failed (cfg.model_agent_cfg.vif.status == otbn_pkg::StatusIdle) Timed out waiting for OTBN to be idle before execution has 1 failures:
Test otbn_rf_bignum_intg_err has 1 failures.
7.otbn_rf_bignum_intg_err.577101146
Line 93, in log /home/greg/work/opentitan_area/opentitan/scratch/master/otbn-sim-xcelium/7.otbn_rf_bignum_intg_err/latest/run.log
UVM_FATAL @ 8710818 ps: (otbn_base_vseq.sv:369) [uvm_test_top.env.virtual_sequencer.otbn_rf_bignum_intg_err_vseq] Check failed (cfg.model_agent_cfg.vif.status == otbn_pkg::StatusIdle) Timed out waiting for OTBN to be idle before execution
UVM_INFO @ 8710818 ps: (uvm_report_catcher.svh:705) [UVM/REPORT/CATCHER]
--- UVM Report catcher Summary ---
Timeout waiting for secure wipe, timeout value too short (potentially increased due to CDC enablement making EDN requests take longer). Secure wipe running correctly at timeout
xmsim: *E,ASRTST (/home/greg/work/opentitan_area/opentitan/scratch/master/otbn-sim-xcelium/default/src/lowrisc_dv_otbn_sva_*/otbn_idle_checker.sv,87): Assertion IdleIfStart_A has failed has 1 failures:
Test otbn_escalate has 1 failures.
52.otbn_escalate.2705017903
Line 94, in log /home/greg/work/opentitan_area/opentitan/scratch/master/otbn-sim-xcelium/52.otbn_escalate/latest/run.log
xmsim: *E,ASRTST (/home/greg/work/opentitan_area/opentitan/scratch/master/otbn-sim-xcelium/default/src/lowrisc_dv_otbn_sva_0.1/otbn_idle_checker.sv,87): (time 13176420 PS) Assertion tb.dut.idle_checker.IdleIfStart_A has failed
UVM_ERROR @ 13176420 ps: (otbn_idle_checker.sv:87) [ASSERT FAILED] IdleIfStart_A
UVM_INFO @ 13176420 ps: (uvm_report_catcher.svh:705) [UVM/REPORT/CATCHER]
--- UVM Report catcher Summary ---
Issue with tracking logic used for assert (Command to start a IMem secure wipe appears the same cycle OTBN hits the idle state).
Signatures seen in nightly but more frequent with CDC on:
UVM_FATAL (otbn_base_vseq.sv:369) [otbn_dmem_err_vseq] Check failed (cfg.model_agent_cfg.vif.status == otbn_pkg::StatusIdle) Timed out waiting for OTBN to be idle before execution has 5 failures:
Test otbn_dmem_err has 4 failures.
0.otbn_dmem_err.2760602419
Line 103, in log /home/greg/work/opentitan_area/opentitan/scratch/master/otbn-sim-xcelium/0.otbn_dmem_err/latest/run.log
UVM_FATAL @ 12994571 ps: (otbn_base_vseq.sv:369) [uvm_test_top.env.virtual_sequencer.otbn_dmem_err_vseq] Check failed (cfg.model_agent_cfg.vif.status == otbn_pkg::StatusIdle) Timed out waiting for OTBN to be idle before execution
UVM_INFO @ 12994571 ps: (uvm_report_catcher.svh:705) [UVM/REPORT/CATCHER]
--- UVM Report catcher Summary ---
2.otbn_dmem_err.2590234280
Line 93, in log /home/greg/work/opentitan_area/opentitan/scratch/master/otbn-sim-xcelium/2.otbn_dmem_err/latest/run.log
UVM_FATAL @ 7469835 ps: (otbn_base_vseq.sv:369) [uvm_test_top.env.virtual_sequencer.otbn_dmem_err_vseq] Check failed (cfg.model_agent_cfg.vif.status == otbn_pkg::StatusIdle) Timed out waiting for OTBN to be idle before execution
UVM_INFO @ 7469835 ps: (uvm_report_catcher.svh:705) [UVM/REPORT/CATCHER]
--- UVM Report catcher Summary ---
... and 2 more failures.
Timeout waiting for secure wipe, timeout value too short (potentially increased due to CDC enablement making EDN requests take longer). Secure wipe running correctly at timeout
xmsim: *E,ASRTST (/home/greg/work/opentitan_area/opentitan/scratch/master/otbn-sim-xcelium/default/src/lowrisc_dv_otbn_sim_*/tb.sv,270): Assertion MatchingStatus_A has failed has 20 failures:
Test otbn_escalate has 18 failures.
1.otbn_escalate.4171015369
Line 113, in log /home/greg/work/opentitan_area/opentitan/scratch/master/otbn-sim-xcelium/1.otbn_escalate/latest/run.log
xmsim: *E,ASRTST (/home/greg/work/opentitan_area/opentitan/scratch/master/otbn-sim-xcelium/default/src/lowrisc_dv_otbn_sim_0.1/tb.sv,270): (time 5045702 PS) Assertion tb.MatchingStatus_A has failed
xmsim: *E,ASRTST (/home/greg/work/opentitan_area/opentitan/scratch/master/otbn-sim-xcelium/default/src/lowrisc_dv_otbn_model_agent_0.1/otbn_model_if.sv,137): (time 5045702 PS) Assertion tb.model_if.NoModelErrs has failed
UVM_ERROR @ 5045702 ps: (tb.sv:270) [ASSERT FAILED] MatchingStatus_A
UVM_INFO @ 5045702 ps: (uvm_report_catcher.svh:705) [UVM/REPORT/CATCHER]
--- UVM Report catcher Summary ---
3.otbn_escalate.1417624640
Line 93, in log /home/greg/work/opentitan_area/opentitan/scratch/master/otbn-sim-xcelium/3.otbn_escalate/latest/run.log
xmsim: *E,ASRTST (/home/greg/work/opentitan_area/opentitan/scratch/master/otbn-sim-xcelium/default/src/lowrisc_dv_otbn_sim_0.1/tb.sv,270): (time 2205650 PS) Assertion tb.MatchingStatus_A has failed
UVM_ERROR @ 2205650 ps: (tb.sv:270) [ASSERT FAILED] MatchingStatus_A
UVM_INFO @ 2205650 ps: (uvm_report_catcher.svh:705) [UVM/REPORT/CATCHER]
--- UVM Report catcher Summary ---
... and 16 more failures.
prim_lc_sync causes a fault in the lc_rma_req MUBI signal, the incoming signal switches from off to on, and the outgoing signal (to OTBN) has an invalid MUBI value for a single cycle before switching to on. This triggers a fatal alert. The invalid value is due to the CDC instrumentation causing a delay in one of the bits but not the other. The life cycle controller explicitly states this synchronisation is sufficient (https://opentitan.org/book/hw/ip/lc_ctrl/doc/theory_of_operation.html#control-signal-propagation), though concern it isn't and this is a genuine CDC issue (https://github.com/lowRISC/opentitan/issues/19044)
I've done a triage pass of failures seen in the OTBN regression when CDC instrumentation is enabled. To do this I compared the failures with the instrumentation on against the failures from the latest nightly (which has instrumentation off) and looked at those signatures that were only seen in tests with the instrumentation on. I also looked at tests that had significantly higher failure rates with instrumentation on but were failing with signatures seen in those tests with instrumentation off.
A full nightly was run against commit d941d7e8b7d25a30b41c714c4c448c87a2853551 modified to enable CDC instrumentation for OTBN by modifying:
https://github.com/lowRISC/opentitan/blob/3db2d45d97475e28e63ecedbe824a0eee108f523/hw/ip/otbn/dv/uvm/otbn_sim_cfg.hjson#L74
All bar one failure are verification environment issues that aren't a concern for M2.5.2. One failure could a be serious CDC issue (https://github.com/lowRISC/opentitan/issues/19044).
Other than #19044 the failures here can be addressed in V3.
Full details of failures triaged and conclusions are below:
New signatures from CDC enablement:
Signatures seen in nightly but more frequent with CDC on: