[kmac] Entropy error code

cindychip commented 2 years ago

When reviewing KMAC entropy related error code, here is one concern we want to flag: There are two KMAC entropy error code: WaitTimerExpired and IncorrectEntropyMode. Currently both of them will flag an KMAC error.

However, the concern here is: the current KMAC error does not block any incoming operations. So with these error, we believe KMAC will continue to process all the incoming operations with the existing entropy. We think this might have some potential issue and at least we should flag an error on keymgr app interface and stop the process there.

@eunchan please feel free to elaborate more. @jadephilipoom @johannheyszl @cdgori please let us know if you have any other thoughts.

Thanks, Cindy

eunchan commented 2 years ago

In Top Earlgrey, we have a few assumptions that may reduce this risk.

App interface (KeyMgr) is initiated by SW always. So, any entropy events SW receives could be assessed and processed prior to the next App request.
Incorrect Entropy Mode case, we can assume it is due to FI attacks. If SW is compromised (kernel SW), then there's no way KMAC avoid manipulations. So, in that case, SW is still able to receive the error event prior to initiate KDFs.
If an userspace SW has been compromised, the error is being received by the kernel SW. so no problem so far.

With assumptions above, still the risk is quite low in my opinion. However, the design as itself has weakness here. So it is better for KMAC to report errors via App interface in case of entropy errors occur. There's no direct error report from kmac_entropy to kmac_app module yet.

jadephilipoom commented 2 years ago

Just to check that I've understood the comments above properly:

If entropy refresh fails, the KMAC block will raise a kmac_error interrupt and write the corresponding error code to the ERR_CODE register, but will not block a new incoming operation, meaning that the next KMAC operation would reuse the existing entropy.
The above is not considered an issue for software-initiated KMAC operations, since software will see the interrupt and stop requesting KMAC operations. However, it could be an issue for the hardware application interfaces, which do not see this error and would not block new operations.
Currently, for top_earlgrey, this is not likely to cause a problem since, of the 3 application interfaces (keymgr, lifecycle control, and ROM control), only keymgr performs a KMAC operation -- the others are cSHAKE -- and all keymgr operations are initiated by software. So if software sees a KMAC error, it will stop initiating keymgr operations.
Although it's not a problem for the top_earlgrey design, it would be nice to report errors through application interfaces so that security doesn't rely on the assumtion that all KMAC operations are software-initiated and -monitored.

@cindychip @eunchan Is that understanding accurate?

eunchan commented 2 years ago

@jadephilipoom Exactly!

johannheyszl commented 1 year ago

Results from the discussion in Sec WG 12/08:

(1) SW needs to check the error in KMAC before using it through key manager; document in SW guidelines.

(2) Mentioned errors are currently not reported on app interface to prevent any unwanted lock-ups; the error reporting on app interface needs to be extended for future versions where no/less SW interaction is available.

cindychip commented 1 year ago

Thank you Johann for summarizing the conclusion. @timothytrippel could you help us update the SW guidelines? @vogelpi could you help update the kmac doc?

vogelpi commented 1 year ago

Hi @cindychip , yes, I can help with the doc but let's wait with documentation content changes until the doc refactor has landed. There will be a large amount of rebasing needed and if we introduce documentation changes in the meantime, there is a non-negligible risk to loose such changes.

cindychip commented 1 year ago

Totally makes sense, thanks Pirmin.

ballifatih commented 1 year ago

estimate 16

johngt commented 1 year ago

@cindychip / @vogelpi - I see this moved up from P3 to P2? Please confirm as it's not easy to see with the labelling system here.

cindychip commented 1 year ago

@johngt @ballifatih for M2.5, this task only needs documentation change. (I think might take 2 hours max?) --> P2 for M2.5 For future release, we can consider adding a error reporting mechanism for interface. ---> future release.

If it is easier for tracking, we can split into two issues?

Reference: https://github.com/lowRISC/opentitan/issues/15088#issuecomment-1346193920

ballifatih commented 1 year ago

Sounds reasonable @cindychip, I forked a new issue to track SW/doc changes which are relevant for M2.5.

We can keep this issue around for the RTL changes for FutureRelease. I will label it accordingly.

msfschaffner commented 9 months ago

@vogelpi can you take a look at this and make an assessment as to whether we should fix this for PROD? Thanks!

vogelpi commented 9 months ago

I had a look at his earlier this week. My view is that the key thing is to document how software should check those error codes. Reporting such errors on the app interfaces is desirable whenever hardware can trigger such operations but it may lead to unwanted side effects including blocking behavior. For Earlgrey, it might be safer to not do it in HW. We should definitely review this in the Integrated context.

johannheyszl commented 9 months ago

Agree w/ @vogelpi (and others before) to not change RTL for Earl Grey. Future Release / Integrated and close as not planned are all fine for me.

SW requirement is documented in #17785

vogelpi commented 9 months ago

Thanks for your feedback @johannheyszl , let's keep this open but in the Integrated context.

lowRISC / opentitan

[kmac] Entropy error code #15088