Closed tjaychen closed 2 years ago
assign to @tjaychen for initial triage.
After some investigation, it appears the test flow is roughly as follows:
This works most of the time. In this particular seed, 1 out of 4 incorrectly de-scrambled data happens to pass integrity checks (this is expected eventually and has been observed by @GregAC elsewhere). As a result, instead of 4 mismatched counts we get 3, and thus the test fails.
Relying on integrity failures like this across randomized keys is probably not going to work 100% of the time. It might be a better idea, for the purposes of this test, to silence the integrity errors via testbench forcing, and just rely on the data itself being incorrect (since the de-scrambled results will essentially never match). That might make the test more robust.
I'll leave it to the next on-call to decide the next best steps.
unassign @tjaychen for next on-call.
Assigning to @engdoreis to look into this. This may be a task for @andreaskurth at some point but it might be best to see if @abdullahvarici and @engdoreis can perform a first pass on this. If it will take more than a couple of hours of work, best to track as an issue; if that is the case ping me and we will create a separate issue for tracking this effort.
I tested again with the HEAD of master and the test passed with the following command:
./util/dvsim/dvsim.py hw/top_earlgrey/dv/chip_sim_cfg.hjson -i chip_sw_sram_ctrl_main_scrambled_access_jitter_en -s 2199609583 -r 1 --build-seed 3859254937
I didn't go dip to discover which commit fixed it. But I beleive it was one of Greg's commits. @johngt should we close this issue?
Thanks @engdoreis - yes this looks like this was resolved after https://reports.opentitan.org/hw/top_earlgrey/dv/2022.08.15_16.33.18/results.html That was the last failed result and since then it has been passing on the nightlies. Closing out as this error no longer applies.
wait, i don't think we should just okay this because it was passing in nightly. Please see my comment above, this will fail again if we don't think about how to address it. It's more or less just a matter of probability.
Sorry, I missed that. I proposing two changes in this PR #14733 to make the test more robust.
chip_sw_otbn_mem_scramble
test.thanks @engdoreis i'll have a look.
Unassigning from lowRISC / myself as our on-call has ended. @tjaychen may want to revisit @engdoreis proposed PR.
Hierarchy of regression failure
Chip Level
Failure Description
UVM_ERROR @ 4510.502520 us: (sw_logger_if.sv:521) [sram_ctrl_main_scrambled_access_test_prog_sim_dv(sw/device/tests/sim_dv/sram_ctrl_main_scrambled_access_test.c:251)] CHECK-fail: mmio_region_read32(mmio_region_from_addr( TOP_EARLGREY_SRAM_CTRL_RET_AON_RAM_BASE_ADDR), INTEGRITY_EXCEPTION_COUNT_OFFSET) == SRAM_CTRL_TESTUTILS_DATA_NUM_WORDS UVM_INFO @ 4510.502520 us: (uvm_report_catcher.svh:705) [UVM/REPORT/CATCHER]
Steps to Reproduce
Tests with similar or related failures