lowRISC / opentitan

OpenTitan: Open source silicon root of trust
https://www.opentitan.org
Apache License 2.0
2.49k stars 742 forks source link

[tlul] Side-channel Hamming weight leakage of `data` on TL-UL #16767

Open ballifatih opened 1 year ago

ballifatih commented 1 year ago

TODOs


Original Description

I would like to get some security opinion about possible side-channel leakage on TL-UL transactions.

In some TL-UL transactions between Ibex and crypto HWIPs, the data part of the transaction is reset to 0. If the side-channel leakage caused by the TL-UL bus has good correlation with the Hamming distance, then I suspect the Hamming weights of the secrets passed with TL-UL transactions might be exposed to an attacker. I think double transition from 0 to data amplifies this effect (0x0 -> data -> 0x0). From side-channel perspective, to me it seems like keeping the last sent value on the data significantly increases the difficulty of recovering the value of each individual word of a secret. I guess resetting data to 0 also has its benefits, but I am not able to see all angles of such a trade-off.

Since we are using peripheral connections to pass secrets among HWIPs, most keys are already immune to this. However, there are still some keys that are passed over TL-UL (not the exhaustive list):

  1. Keymgr generated SW keys,
  2. Keymgr generated identity seed (if they are passed to OTBN through SW),
  3. SW generated/managed symmetric keys/secrets for AES/OTBN/KMAC/HMAC HWIP.

This observation (0x0 -> data -> 0x0) is not consistent on all sides of xbar, and I only looked at two examples.

In the first waveform, Ibex is reading identity seed (target=SW) from keymgr:

keymgr_reading_simplified

Ibex is writing to key to AES:

TLUL-aes-leakage

cc: @johannheyszl @jadephilipoom @bilgiday @gdessouky

johannheyszl commented 1 year ago

Thanks @ballifatih, cc @moidx

TL;DR:

I'd assume that most values are in shares; @ballifatih

generally, IMO if we keep any old, potentially sensitive, values on the bus, through switching, we might create even more instances of Hamming distance between either zeros or other values.

johannheyszl commented 1 year ago

@vogelpi for viz

ballifatih commented 1 year ago

@johannheyszl AFAIU all these secret values are sent in two shares over TL-UL (I can see related CSRs have two shares). Since each word of the two shares are sent in sequential TL-UL transactions, I think it makes sense to assume that the attacker can read HW of both shares. I can see two follow-up discussion points:

  1. Is 0 -> data -> 0 is really worse than data_prev -> data_next transition from SCA perspective?
  2. Are two shares sent in sequence enough to prevent SCA? For 32-bit two words X and Y, if the attacker gets both HW(X) and HW(Y), then how much recovering advantage is obtained on X XOR Y? In particular, one should note that the attacker will get different (HW(X), HW(Y)) values for the same X XOR Y during each observation.

p.s. I am using HW(X) to refer to Hamming weight of the value X.

johannheyszl commented 1 year ago

thx.

ballifatih commented 1 year ago
johannheyszl commented 1 year ago

thx! nice, so:

tjaychen commented 1 year ago

hey all, could you shed some more light on how the shares read in sequence creates and issue? Is the basic idea that the bus is narrower (fewer bits toggling), so it would be easier for an attacker to figure out the hamming weight? Secondly, assuming the register can be read multiple times (from keymgr), is the idea of averaging to reduce the noise from other parts of the bus so that the HW of the bus values can be surfaced?

Lastly, I am unsure now if this helps or hurts, but the software output registers from keymgr are actually "read clear". Meaning you cannot actually repeatedly read them. But it also means after every read there is a "value" -> "0" transition.

tjaychen commented 1 year ago

the 0 transition on the ibex probably has more to do with how the tlul sockets are constructed.. ie, for a peripheral that is not selected, all of its inputs probably just get blanked.

johannheyszl commented 1 year ago

thx tim. Our gut feeling is that we will likely not have an issue here. We will discuss today in the SCA sub WG. We might put a leakage test on the post-silicon test plan to make sure if we think its necessary.

re shares in sequence: if in any of the TL-UL registers or other, shares are loaded through FFs in sequence, the occurring Hamming distance would be equal to the Hamming weight of the unmasked value. But this is only if e.g. word 0 from share 0 is succeeded by word 0 from share 1. If reading all words from share 0 then all of share 1, this is IMO not an issue.

re averaging: Repeating through reading multiple times, allows averaging out of noise factors such as electrical noise in measurement chain, and noise signal from uncorrelated logic/functionality on OT. Experience shows that attacks on such wide words only ever succeed if averaging is possible to get 'good samples' for for template matching. All correlated noise remains of course. If the sequence of words is randomized, averaging is not possible, which is nice :)

tjaychen commented 1 year ago

sounds good, should this become software guidance then? it sounds like two things..always process 1 share fully ahead of the other. And within that share, randomize the sequence. This probably means we can't have any fifo like structures to store the keys (i dont think we do), but it might be something we will have to double check.

ballifatih commented 1 year ago

Summarizing some points from OT-SCA meeting:

And on the SW guideline side:

What remains is to check whether TL-UL adapters are behaving as intended. Two unexpected observations:

I will look at these small TL-UL inconsistencies again and create a spin-off issue for those.

vogelpi commented 1 year ago

Thanks @ballifatih for starting this discussion and preparing the ot-sca meeting. It's an interesting and relevant topic I believe. I fully agree with your summary above.

On a side note, inside the entropy complex data_prev -> data_next is preferred over 0 -> data -> 0 because there we don't have spurious write enable protection and latching in any deterministic value downstream e.g. through FI would be very bad. But you summarized in your comment above, for the TL-UL bus things are different.

andreaskurth commented 1 year ago

Triaged for tlul:

What remains is to check whether TL-UL adapters are behaving as intended. Two unexpected observations:

* Why do we see `data_prev -> data_next` on the `keymgr` TL-UL output?

* What is the value `0x20` that leaks to TL-UL `data` port from Ibex side?

I will look at these small TL-UL inconsistencies again and create a spin-off issue for those.

@ballifatih: Could you please link the issue here? Do your findings there agree with the following:

IIUC the discussion above, we'll resolve this issue with SW guidelines post M2.5 but don't need to take action for M2.5. If so, I'd tag this https://github.com/lowRISC/opentitan/labels/Type%3AIcebox. @vogelpi: Do you agree?

ballifatih commented 1 year ago

Sorry @andreaskurth, I couldn't get back to this issue to spin off the relevant discussion. Here it is #17330, so that we can isolate the TL-UL discussion from the SCA/security discussion.

Feel free to close this issue @andreaskurth and use the new one.

andreaskurth commented 1 year ago

Thanks @ballifatih (and no worries :slightly_smiling_face: )!

From your summary above, I think

And on the SW guideline side:

* Avoid reading/writing secrets in 8-bit or 16-chunks.

* Reading/writing shares in alternating manner is probably bad. Process one share fully and then move to another.

* As @johannheyszl suggested randomizing the loading order of key words might be an additional counter-measure that we can implement on SW side, if needed later.

* As @vogelpi and @bilgiday pointed out, feeding some random values from an LFSR post-transaction is also an idea we can keep on the side for now.

is still open and tracked by this issue. So I would keep this issue open to track the completion of the SW guidelines. I'm changing the labels accordingly and will https://github.com/lowRISC/opentitan/labels/Type%3AIcebox it because non-ROM SW can be done post M2.5. @alphan: I think ROM code already adheres to those SW guidelines, right?

Let's continue the TL-UL hardware discussion in #17330.

johannheyszl commented 9 months ago

@jadephilipoom this is an issue with items for SW security guidelines (which I think are already covered). Let's close if redundant. thanks