[keymgr/kmac] how kmac should send `kmac_data_i.error`

weicaiyang commented 1 year ago

As brought up in this comment, we may want to find a better way to send the error when KMAC detects a fault during an operation requested by keymgr. We haven't had a conclusion in that discussion, so file this issue to continue on it.

Here are some approaches we brought up.

current implementation (no design change needed): kmac sends error without done, and keymgr gets stuck at the operation. Eventually, the chip will be reset as kmac sends a fatal alert.
make the operation complete normally: change Kmac also send done along with error. Also need to set ready high to accept the remaining data from keymgr.
treat the error as an interrupt to keymgr: keymgr finishes the operation once it sees the error is set. This changes the handshaking protocol, which impacts the current DV. My assumption was that error only asserts when done is set. We need to update the block-level TB if we choose this.

As a DV person, slightly prefer to 1) or 2) as they have less DV impact. I'm open to 3) as well, if it makes more sense to the design.

cc: @ballifatih @tjaychen @eunchan @cindychip

ballifatih commented 1 year ago

Thanks for the spin-off issue @weicaiyang.

I do not completely understand the alert handler, so I think I missed the part about chip reset. If a fatal error at KMAC (in particular glitching FSM) eventually forces a chip reset, then I am also fine with option 1) or 2) (in given order).

If the chip is going to reset, do we really need to care about keymgr being stuck at a state? Is it possible to have a configuration where the chip doesn't reset on KMAC fatal error?

tjaychen commented 1 year ago

i think that's the chip's behavior in an ideal scenario. But it's always good to have some defense in depth here just in case the alert_handler were somehow also attacked. I personally think number 2 would be good, but i'm fine with option 1 too if it churns things a lot.

cdgori commented 1 year ago

I would prefer 1 or 2 as well (mostly 2 unless we can't afford the change), as the interrupt-ish behavior of 3 means thinking about potentially a lot of other weird cases.

But if someone can make an argument for why we need 3, I'd listen.

weicaiyang commented 1 year ago

I do not completely understand the alert handler, so I think I missed the part about chip reset. If a fatal error at KMAC (in particular glitching FSM) eventually forces a chip reset, then I am also fine with option 1) or 2) (in given order).

Once a fatal alert occurs, alert_handler will eventually reset the chip, unless something is wrong.

ballifatih commented 1 year ago

Triaged for both KMAC and keymgr.

This change is not needed for M2.5 because we can deal with this issue on SW and with alert handler, as @weicaiyang suggested. Labeling this for the future milestones.

On the other hand, one thing that is critical for M2.5 is that we would like to ensure that default alert handler configuration from ROM+OTP covers this case and resets the chip when keymgr/KMAC is stuck.

In addition to this, it is best to also check on the Ibex side (when the keymgr operation e.g. advance_state is initiated) whether KMAC produces an error. This second suggestion does not look like a blocker for M2.5, as this code resides in RomExt.

cc: @bilgiday, @jadephilipoom

johannheyszl commented 10 months ago

This seems to be a ROM_ext issue, relabeled accordingly

lowRISC / opentitan

[keymgr/kmac] how kmac should send `kmac_data_i.error` #16792