Closed mundaym closed 2 months ago
I think @nbdd0121 has investigated the test in more depth and might comment here?
@moidx is currently investigating this. If there are any comments from others please provide them
Configuring the alert_handler triggers nmi interrupt without any alert class association.
The alert NMI is triggered even when there are no pending alerts. Verified this by removing the force alert.
Disabling the ping timer removes the spurious NMI issue:
The following write in dif_alert_handler.c is what triggers the NMI from alert_handler:
if (enabled == kDifToggleEnabled) {
mmio_region_write32_shadowed(
alert_handler->base_addr,
ALERT_HANDLER_PING_TIMER_EN_SHADOWED_REG_OFFSET, 1);
}
I tried switching the test to use a recoverable alert, different escalation sequences and a valid ping timeout timer configuration, but none of these changes helped to move the alert handler NMI.
I will submit these test updates separately, but I think at this point I am going to flag this for CDC analysis.
CC: @a-will @matutem @nbdd0121 who also took a look at this issue.
I am going to measure the delay between enabling the ping mechanism and the NMI to try to determine if this is an issue with the reverse ping mechanism supported by the receivers.
This was recommended by @msfschaffner.
Running the test with the alerts and ping configured, and without triggering the alert results in the first alert NMI triggering between 500-4000 microseconds.
This as measured with rv_timer using an equivalent 1us tick.
The interrupt fires consistently within 8-9 us. This seems to indicate that the issue is due to a ping timeout.
The reverse ping timeout calculation is done using the following formula available in
prim_esc_receiver
:
4 * N_ESC_SEV * (2 * 2 * 2^PING_CNT_DW)
pwrmgr
is the only block consuming the N_ESC_SEV
and PING_CNT_DW
compile time
parameters:
alert_handler_reg_pkg::N_ESC_SEV = 4
alert_handler_reg_pkg::PING_CNT_DW = 16
The alert escalation responder inside pwrmgr
is connected to the io_div4
clock,
yielding a target 24MHz frequency. The result expected timeout based on the above
parameters is thus:
reverse_ping_timeout = 0.175s = (4 * 4 ( 2 * 2 * 2^16)) / 24e6
The interrupt trigger measurement does not seem to rule out any potential issues with the reverse ping mechanism.
uint32_t cycles[3] = {0};
CHECK_STATUS_OK(alert_handler_testutils_get_cycles_from_us(
kEscalationPhase0Micros, &cycles[0]));
CHECK_STATUS_OK(alert_handler_testutils_get_cycles_from_us(
kEscalationPhase2Micros, &cycles[1]));
CHECK_STATUS_OK(alert_handler_testutils_get_cycles_from_us(kIrqDeadlineMicros,
&cycles[2]));
dif_alert_handler_escalation_phase_t esc_phases[] = {
{.phase = kDifAlertHandlerClassStatePhase0,
.signal = 0,
.duration_cycles = cycles[0]},
{.phase = kDifAlertHandlerClassStatePhase1,
.signal = 3,
.duration_cycles = cycles[1]}};
dif_alert_handler_class_config_t class_config[] = {{
.auto_lock_accumulation_counter = kDifToggleDisabled,
.accumulator_threshold = 0,
.irq_deadline_cycles = cycles[2],
.escalation_phases = esc_phases,
.escalation_phases_len = ARRAYSIZE(esc_phases),
.crashdump_escalation_phase = kDifAlertHandlerClassStatePhase2,
}};
dif_alert_handler_alert_t alerts[] = {kTopEarlgreyAlertIdAesRecovCtrlUpdateErr};
dif_alert_handler_class_t alert_classes[] = {kDifAlertHandlerClassA};
dif_alert_handler_class_t classes[] = {kDifAlertHandlerClassA};
dif_alert_handler_config_t config = {
.alerts = alerts,
.alert_classes = alert_classes,
.alerts_len = ARRAYSIZE(alerts),
.classes = classes,
.class_configs = class_config,
.classes_len = ARRAYSIZE(class_config),
.ping_timeout = 0x100,
};
@andreaskurth, this test is reproducible without a ROM_EXT running. @a-will suggested we can try to get a DV test configuration ready to run in GLS in case this is something we want to try.
CC: @sha-ron @OTshimeon
By @moidx: We can run chip_sw_rv_core_ibex_nmi_irq
with test ROM on the netlist. We need to update the testcase to not trigger a fake alert, just wait. Then we shouldn't see any NMIs and can wait for the timeout. We should run this GLS over the next weekend. @moidx will create a test case.
Created the test case in #23441 and added test point to the GLS test plan. If unable to debug further on Z1, I propose we close this issue and consider removing dropping the ping mechanism from A1 if we continue to run into problems during bring-up. We can make this decision as part of M5 triage.
@johannheyszl FYI, since we'll have to test alert handler behavior with pinging mechanism disabled for Z1.
Discussed during triage meeting. Okay to move this to M5.
@moidx thanks for the heads up. IMHO this is OK for the testing we currently do, i.e. not fully invasive and cutting wires.
Moving this to M6 as P1. It should be tested early on the final netlist. CC @sha-ron, we'll discuss this in our next meeting
https://github.com/lowRISC/opentitan/issues/24119 tracks the findings after running the sw_alert_handler_ping_ok
test post synthesis.
@moidx We could add this test to the GLS test plan as a P2. It should now be passing now with the latest ECO fixes in.
Confirmed that chip_sw_rv_core_ibex_nmi_irq
is on the GLS testplan. Suggest moving this to M7 (still as P1).
Closing this issue as the sw_alert_handler_ping_ok
is now passing in GLS. We can create a new issue if rv_core_ibex_nmi_req_irq_test
we are able to run the test in GLS and results in failure.
Description
Enabling alert_handler ping mechanism results in alert_handler NMI without any reported local of regular alerts. The only way to recover from this is by resetting the device.