lowRISC / opentitan

OpenTitan: Open source silicon root of trust
https://www.opentitan.org
Apache License 2.0
2.51k stars 745 forks source link

[kmac/rtl] Decide if we need a way to recover from an error if the HW application interface does not provide the last data item. #22955

Open vogelpi opened 4 months ago

vogelpi commented 4 months ago

Description

Factored out from #22794: we should decide if KMAC needs a way to recover from an error if a hardware application interface does not provide the last data item.

This may happen e.g. because of an escalation or bug in the hardware block connecting to the hardware interface. Right now, KMAC cannot handle this and will become unusable. One way to avoid KMAC becoming unusable would be to add an interface for software to clear all hardware application interfaces and put them back into a known good state.

For Earl Grey, @andreaskurth and I disucssed that we most likely don't need that because the hardware blocks interfacing with KMAC over app interfaces are critical system blocks like ROM_CTRL, LC_CTRL and KEYMGR. If any of these blocks hangs due to an issue, we anyway need to reset the system and thus KMAC. But for future versions, this might be desirable as other hardware blocks might get an application interface as well (e.g. OTBN or DMA). A hang condition in any of these blocks would then cause KMAC and thus ROM_CTRL, LC_CTRL, KEYMGR to hang which is most likely not acceptable.

I am thus adding the FutureReleases label.

vsukhoml commented 4 months ago

What is the current behavior? Would an error be reported in KMAC.INTR_STATE.kmac_err or KMAC.STATUS? Is it possible to reset it from software? Like KMAC enable / disable somehow? If other hw block like KEYMGR is using KMAC, and KMAC hangs - would it be possible for software to check KMAC status and reset it? If yes, it can be a workaround at the cost of extra checks.

I suppose that KMAC should report this state in one of the status registers, and software using it should check for it and reset it as needed. Hardware blocks need to propagate error status to their status. Hanging is definitely not a good behavior.

andreaskurth commented 4 months ago

What is the current behavior? Would an error be reported in KMAC.INTR_STATE.kmac_err or KMAC.STATUS?

KMAC's Programmer's Guide should answer this (PLMK if it's unclear):

When the KMAC HW IP encounters an error, it raises the kmac_err IRQ. SW can then read the ERR_CODE CSR to obtain more information about the error. Having handled that IRQ, SW is expected to clear the kmac_err bit in the INTR_STATE CSR before exiting the ISR. When SW has handled the error condition, it is expected to set the err_processed bit in the CMD CSR. The internal SHA3 engine then flushes its FIFOs and state, which may take a few cycles. The KMAC HW IP is ready for operation again as soon as the sha3_idle bit in the STATUS CSR is set; SW must not change the configuration of or send commands to the KMAC HW IP before that. If the error occurred while the KMAC HW IP was being used from SW (i.e., not via an HW application interface), the kmac_done IRQ is raised when the KMAC HW IP is ready again.

If the HW application interface doesn't provide the last data item, the internal SHA3 engine will remain in the flushing state and the sha3_idle bit won't get set.

Is it possible to reset it from software? Like KMAC enable / disable somehow? If other hw block like KEYMGR is using KMAC, and KMAC hangs - would it be possible for software to check KMAC status and reset it?

KMAC and keymgr can be reset together with most modules through a SW-requested system reset. This means the CPU will start executing from ROM again (although ROM behavior might be different as the reset reason is SW-requested rather than power-on reset).

I suppose that KMAC should report this state in one of the status registers, and software using it should check for it and reset it as needed. Hardware blocks need to propagate error status to their status. Hanging is definitely not a good behavior.

KMAC does report this state to SW through the sha3_idle status bit, and SW can recover from it through a system reset. This issue was mainly intended to raise the question whether a more targeted clearing method than a system reset is needed. Such targeted clearing is tricky, though, because FSMs in at least two different modules need to be brought back to the initial state simultaneously.

If it helps, we can add this extra information to KMAC's documentation.

vsukhoml commented 4 months ago

@andreaskurth , thank you for detailed answer!

question whether a more targeted clearing method than a system reset is needed

This clearing method would be very useful. Cryptolib can always check the state of engine before proceeding. I wonder though why only system reset can help? Why KMAC enable/disable is not enough? From documentation you've cited, it seems that set err_processed bit in the CMD CSR. is enough. Or this is because this reset may need to be applied to other modules depending on KMAC? Can state of other modules be recovered? I guess KEYMGR would have to be reloaded if it stuck, thus system reset?

If it helps, we can add this extra information to KMAC's documentation.

Yes, this always helps later with maintenance and development. Specifically this stuck state and potential recovery from it for the cascade of dependent modules. I'd imagine some check_hw_state function in Cryptolib to make sure that before proceeding all hw module are in the correct state.