Open jwnrt opened 4 months ago
Thx for reporting this issue @jwnrt. Adding to M4 to ensure we resolve this in time.
This test (which runs in
rma
,dev
, andtest_unlocked1
) has been failing on FPGAs since commit b2239fc.
Do you know if the parent commit (https://github.com/lowRISC/opentitan/commit/d77a3a32f975da034b29c9b56391b670eec46af8) is known good, i.e., the test passes in all LC states for that parent commit?
Thx for reporting this issue @jwnrt. Adding to M4 to ensure we resolve this in time.
This test (which runs in
rma
,dev
, andtest_unlocked1
) has been failing on FPGAs since commit b2239fc.Do you know if the parent commit (d77a3a3) is known good, i.e., the test passes in all LC states for that parent commit?
It does.
The commit where failures first appear seems to have merely triggered a latent bug, likely either in Vivado's synthesis / layout tools or in the timing of the JTAG enablement pathways.
Ok, I'll take a closer look at the JTAG enablement pathways.
The test has started passing again with some recent RTL changes today, but it doesn't look like it was intentionally fixed. This could mean the issue still exists but is masked by a different routing on FPGAs?
The test has started passing again with some recent RTL changes today, but it doesn't look like it was intentionally fixed. This could mean the issue still exists but is masked by a different routing on FPGAs?
Yes, that's right. We don't know if it is just a tool bug or some timing problem, though.
FYI: the following tests need to be re-activated in CI once this is addressed: https://github.com/lowRISC/opentitan/pull/22744/commits/bd3e4ed7969274eb9009fa413d3dbaa01b79b10c
@andreaskurth has been able to reproduce this but couldn't root cause this. Thinking that it could be a problem on ASIC but so far no indication of that DV is fine. Prioritizing other P0 and P1.
@a-will if timing related, it could be that stuff is handled better on ASIC because the SDCs are not the same.
@moidx do have test coverage in GLS.
Discussed to leave priority as is but we prioritize other P0s and P1s first.
@moidx it would be nice to capture the findings such that someone else can pick up the work if someone becomes available. @andreaskurth , would be able to document the steps taken please?
This may or may not be relevant:
--build-seed 104714960319679935410420483500971829136303708457300037460974663680452494898918
GitHub Revision: b29ffbb03c
VCS
UVM_FATAL @ * us: (chip_sw_rv_dm_access_after_wakeup_vseq.sv:56) [chip_sw_rv_dm_access_after_wakeup_vseq] Timed out waiting for device to enter normal sleep. has 3 failures:
Test chip_sw_rv_dm_access_after_wakeup has 3 failures.
0.chip_sw_rv_dm_access_after_wakeup.77787982882959533724642802343103680401343926437350432772420472162649361881555
Line 802, in log /container/opentitan-public/scratch/os_regression/chip_earlgrey_asic-sim-vcs/0.chip_sw_rv_dm_access_after_wakeup/latest/run.log
UVM_FATAL @ 4575.453826 us: (chip_sw_rv_dm_access_after_wakeup_vseq.sv:56) [uvm_test_top.env.virtual_sequencer.chip_sw_rv_dm_access_after_wakeup_vseq] Timed out waiting for device to enter normal sleep.
UVM_INFO @ 4575.453826 us: (uvm_report_catcher.svh:705) [UVM/REPORT/CATCHER]
--- UVM Report catcher Summary ---
1.chip_sw_rv_dm_access_after_wakeup.42045925832267773038863112318651299469133308811198817911363044455600557074244
Line 780, in log /container/opentitan-public/scratch/os_regression/chip_earlgrey_asic-sim-vcs/1.chip_sw_rv_dm_access_after_wakeup/latest/run.log
UVM_FATAL @ 3673.546356 us: (chip_sw_rv_dm_access_after_wakeup_vseq.sv:56) [uvm_test_top.env.virtual_sequencer.chip_sw_rv_dm_access_after_wakeup_vseq] Timed out waiting for device to enter normal sleep.
UVM_INFO @ 3673.546356 us: (uvm_report_catcher.svh:705) [UVM/REPORT/CATCHER]
Moving to P2 as most critical use cases for rv_dm
don't involve power transitions.
They do involve software initiated resets and PORs, but no sleep / wake functionality. Can we remove the broken tags here and here then (to get these running in presubmit again)?
Just discussed in triage, keeping this in M5 for now since there are related DV tests that are failing
The DV runs of rv_dm_access_after_wakeup
should get fixed by PR https://github.com/lowRISC/opentitan/pull/23924.
Just discussed in triage: If we cannot close this by the end of next week due to resourcing constraints, we'll take it with us to M6.
Just curious: @andreaskurth is this reproducible in DV? or only on FPGA?
Only on FPGA at this point
Moving to M6 to continue analysis through CDC tools
Discussed in the triage meeting to move this to M7 as P1. In the remaining time of M6, we want to focus on the analysis through CDC.
@moidx Could also test that in GLS. But may opt to not fix it if it fails given the timeline.
Description
This test (which runs in
rma
,dev
, andtest_unlocked1
) has been failing on FPGAs since commit b2239fc38e0725f17b1155d7e48ec6403facf7f6.That commit is almost certainly not the cause of the RV DM error, but the size change seems to have triggered a change in the FPGA routing and broken something.
The error comes from OpenOCD failing to connect to the debug module after the chip wakes from deep sleep.
Here are the parts of the test where the failure triggers:
Here's what OpenOCD says:
@a-will reports that this issue does not present in FPGA bitsreams built with Vivado version 2023 but it does with version 2021 that our CI uses. The lifecycle controller TAP is also not working.