[lc_ctrl] Lifecycle controller not ready for transitions until after some delay

jwnrt commented 1 year ago

Summary

For https://github.com/lowRISC/opentitan/issues/17742, I'm rewriting the e2e_bootstrap_rma test to use a different API. This test does a lifecycle transition from PROD to RMA through JTAG to the LC TAP.

I'm running into a problem where:

Sending the START command will increment the TRANSITION_CNT correctly and set TRANSITION_REGWEN to 0 as expected.
The transition doesn't happen, and the device stays in PROD.
No error bits are set in STATUS, which stays as 0x03 (INITIALIZED | READY).

Adding a 1 second delay at any point between reset and sending the START command causes the transition to succeed, i.e.:

The transition starts, and the device goes from PROD to POST_TRANSITION.
The STATUS changes to 0x05 (INITIALIZED | TRANSITION_SUCCESSFUL).
After a reset, the device is in RMA.

I'm following the programming guide for the lifecycle controller, which includes checking that it's in the READY state before starting any transitions. Because the delay can be inserted anywhere between reset and sending START, it seems like the device just isn't ready for transitions until a delay after reset, even if READY is set.

The original version of the test works without a delay added, so there could still be a problem with my test code.

Reproducing steps

The code for the test is in this branch: https://github.com/lowRISC/opentitan/pull/17863

You can run it yourself on a CW310 using:

bazel test --test_output=all --define bitstream=gcp_splice //sw/device/silicon_creator/rom/e2e:e2e_bootstrap_rma

This should succeed with the delay included, but fail with std::thread::sleep(Duration::from_secs(1)) commented out.

Note that OpenTitanTool currently doesn't know that the OTP has changed and the bitstream needs reloading, so you'll have to manually clear it between runs:

bazel run //sw/host/opentitantool -- --interface cw310 fpga clear-bitstream

jwnrt commented 1 year ago

@msfschaffner @tjaychen do you know anything that could cause this problem of needing a delay after reset to get a successful transition, or know who might know? Thanks!

msfschaffner commented 1 year ago

This is interesting...

The OTP_CTRL and LC_CTRL should have successfully initialized and should be ready to accept commands if the STATUS.READY bit is set.

I wonder however whether there is any interplay with the rest of the boot sequence on the FPGA. I.e., OTP_CTRL and LC_CTRL are initialized before the ROM check is started and the processor is released. So you could be issuing a transition command while the rest of the system is still booting.

That could be an issue if we're trying to initiate a transition on the real ASIC without the external clock option, since there are some clock calibration values that ROM is supposed to read from OTP and dump into AST.

For the FPGA however I don't think it matters, since the OTP emulation does not require a stable clock frequency (plus, the AST model on the FPGA does not have to be calibrated).

pamaury commented 1 year ago

I have done some experiments related to this and now I am uncertain about the purpose of the RMA bootstrap pins. I tn did three tests:

Run e2e_bootstrap_rma but keep the RMA bootstrap pins set: since CREATOR_SW_CFG_RMA_SPIN_CYCLES is set to a low value, the ROM constantly spins for RMA and reboots in a loop. In this mode, the JTAG connection is dropped really quickly and the test fails.

I then modified the BUILD fine to set

"CREATOR_SW_CFG_RMA_SPIN_CYCLES": "0x2000000",

In my tests, this corresponds to a rma spin delay in the ROM of about 4 seconds on the FPGA.

Run e2e_bootstrap_rma without the delay: in this mode, we try to perform the entire transition while the ROM is in the RMA spin but does not reboot. This time, the JTAG connection is stable but the problem discovered by @jwnrt happens: the transition times out and never completes.
Run e2e_bootstrap_rma with a delay of 5 seconds: in this mode, the delay is greater than the RMA spin so the ROM will spin, reboot and boot normally. We try to perform the entire transition after the ROM has booted normally. This time, the JTAG connection is stable and the transition works.

Tentative conclusions:

the reset performs by the RMA spin loop breaks the JTAG connection, this could be problematic because it means JTAG is useful is the chip is stuck in this mode. Also I am unsure if this really the expected behavior.
it is impossible to perform an LC transition while the ROM is in the RMA spin loop, which seems to be contrary to the whole point of this mode?
the ROM might be doing some initialization in non-RMA boot mode than is necessary to perform a LC state transition and which is not documented anywhere?

johngt commented 1 year ago

@johannheyszl / @gdessouky

johngt commented 1 year ago

If this is a HW bug - will need to determine if we want to make the RTL change or proceed with SW workaround. @johannheyszl - might be worth considering the security implications.

msfschaffner commented 1 year ago

Thanks for the detailed debug notes! Looping in @alphan, since I he was part of the RMA spin wait discussions.

@jwnrt noted above that the TRANSITION_CNT is incremented, but only the transition itself does not complete. This makes me think that the failure mode may be related to the flash erase that we trigger upon RMA entry. I.e., iff a transition into RMA is requested, the life cycle controller triggers a secure flash erase and will only proceed with the life cycle transition once that erase op has concluded. The ROM performs some flash initialization if I recall correctly, so that may be preventing the secure erase from working properly.

Can we try to perform the flash initialization before the RMA loop in the ROM code to check whether this is actually the case?

gdessouky commented 1 year ago

@jwnrt mentions that the original version of the test works without the delay added...

@pamaury do you also observe the same issues you describe but with the original version of the test when you play around with CREATOR_SW_CFG_RMA_SPIN_CYCLES and the delay from reset?

gdessouky commented 1 year ago

AFAICT the only security impact here is denial of service; the device keeps failing to transition to RMA, though TRANSITION_CNT keeps incrementing still, and the limit is reached, and the device is locked. And this is already possible anyway without this potential issue.

I understand this doesn't come anywhere near the flash erase that is triggered on RMA entry, right?

@jwnrt @pamaury I wonder why is this happening only with this version of the test and not the original version? How are the mechanics of interaction with the hardware in this test different than the original?

pamaury commented 1 year ago

The original test fails when we increase the value of CREATOR_SW_CFG_RMA_SPIN_CYCLES. This is consistent with the fact that the original value is too low so the device spins and then immediately reboots and does a normal boot (because the test removes the SW pin strapping after reset). So in reality, the original test is running in normal boot mode, not in RMA bootstrap mode.

I have done some bisecting in the code and my findings are as follows:

If I stop the boot before the entropy setup then the test fails as described by @jwnrt : the LC controller never finishes the transition.
If I stop the boot right after the entropy setup then it works.

This suggests that the LC controller cannot actually perform a transition from PROD to RMA without a minimum level of entropy. As a result, the RMA bootstrap mode does not seem to be working as expected. I am not familiar with the LC hardware but I think there are three issues here:

the documentation should state that a minimum level of entropy is necessary and the programming guide should be updated
maybe the hardware should report an error instead of just hanging when entropy is not set up?
the behaviour of the RMA bootstrap mode should be clarified: since the reset done after spinning breaks the JTAG connection and the spin value is very low, it seems impossible to perform an LC transition in this mode at the moment

Pinging @alphan who has worked on the RMA bootstrap mode

alphan commented 1 year ago

Cc @dmcardle

msfschaffner commented 1 year ago

Ok I think I have located the problem.

Upon RMA entry, the life cycle controller requests a secure wipe of the flash. When we designed the flash controller, we explicitly made sure this can complete without EDN being initialized. Hence, the seed used for the LFSR to write pseudo random data to flash is derived from the RMA token value, which is device-unique.

However, about a year or so ago, we decided to also trigger the OTBN secure wiping mechanism upon RMA entry, to make sure that the OTBN memories do not contain any secret material after entering RMA. This secure wiping mechanism does unfortunately not use the fixed seed described above, but it requests fresh entropy from EDN.

The RMA wipe request signals are connected as follows: lc_ctrl -> flash -> otbn -> lc_ctrl. Hence, the life cycle controller will wait and get stuck if EDN is not initialized before the transition, because OTBN tries to request fresh entropy for the secure wipe.

Ideally, the RMA entry process should not be dependent on any sort of EDN initialization nor fresh entropy, because this could prevent RMA entry if something with the entropy complex is not right. So, ideally we would also use the rma_seed value that is sent from the lc_ctrl to the flash_ctrl.

Looping in some OTBN folks: @GregAC @vogelpi @andreaskurth WDYT, how difficult would it be to change that?

Alternatively, we could make a software workaround and always initialize EDN before the RMA busy loop - but that is not ideal, since it defeats the purpose of placing the RMA busy loop as early as possible (i.e., before most init code) in the ROM.

msfschaffner commented 1 year ago

@pamaury @jwnrt @moidx @alphan another side note: I think the intent was to use the debugger to halt the processor during the RMA busy loop so that the ROM code is stopped and does not reset the chip. This may not be important for the open-source test, but since the timing in the closed source for flash writes / erase could be drastically different from the FPGA timing, it would be prudent to do. We can track this in a separate issue, though.

alphan commented 1 year ago

@pamaury @jwnrt @moidx @alphan another side note: I think the intent was to use the debugger to halt the processor during the RMA busy loop so that the ROM code is stopped and does not reset the chip. This may not be important for the open-source test, but since the timing in the closed source for flash writes / erase could be drastically different from the FPGA timing, it would be prudent to do. We can track this in a separate issue, though.

Please see my comments in the linked issue.

msfschaffner commented 1 year ago

Thanks @alphan - I think you are right and this is not actually needed. I commented on the linked issue and closed it.

andreaskurth commented 1 year ago

Ideally, the RMA entry process should not be dependent on any sort of EDN initialization nor fresh entropy, because this could prevent RMA entry if something with the entropy complex is not right. So, ideally we would also use the rma_seed value that is sent from the lc_ctrl to the flash_ctrl.

Looping in some OTBN folks: @GregAC @vogelpi @andreaskurth WDYT, how difficult would it be to change that?

From the data perspective, that should be feasible because lc_ctrl's lc_flash_rma_seed_o as well as otbn's edn_urnd_i.edn_bus are 32-bit values. OTBN could thus use the former instead of the latter to reseed its PRNG. Would we want to use the exact same value for OTBN as for flash_ctrl? Or a separate value of the same width, also provided by lc_ctrl?

From the control perspective, otbn_start_stop_control would have to be modified to not request entropy from EDN to reseed its PRNG upon an RMA request but to instead use the seed provided by lc_ctrl. That should be feasible with relatively little effort if the seed is valid upon lc_rma_req_i; i.e., if there's no additional handshake required to obtain the seed. I think lc_ctrl already follows this mechanism with flash_ctrl, but it would be good to formally agree on this for OTBN as well.

From the DV perspective, we currently model an RMA request like a fatal error, meaning the regular secure wipe procedure including URND reseed and ending in locked; see https://github.com/lowRISC/opentitan/pull/14265. We should be able to store the fact that a secure wipe was triggered by an RMA request and not a fatal error, and based on that decide to reseed URND with a different value, though. I don't think this will be very complex, but there may be some corner cases with timing that need extra work.

So my effort estimate at this point would be 0.5 to 1 days DD and 1 to 3 days DV work.

It would be good to assess this change from the security perspective as well. The RMA seed provided by lc_ctrl is derived from the transition token, right? Is it correct that this token is not known to an attacker who could observe power traces during a secure wipe?

johannheyszl commented 1 year ago

Would it make sense to simplify RMA-entry-triggered secure wipe in OTBN and use something existent, e.g. the current PRNG state or even zero instead of inputting rma_seed? This could simplify DD and DV?

We had been careful with secure wiping using random seeds during regular operation because of SCA implications if I recall correctly (which is very difficult already because attacker needs to successfully perform SCA and overcome scrambling). IMHO the attack complexity here is high enough to use simple wiping: attackers need possession of the RMA token, then recover data through SCA, then overcome scrambling. Note that w/o RMA-triggered wiping the attack would still need RMA token plus overcoming scrambling which is not easy.

andreaskurth commented 1 year ago

Would it make sense to simplify RMA-entry-triggered secure wipe in OTBN and use something existent, e.g. the current PRNG state or even zero instead of inputting rma_seed? This could simplify DD and DV?

So if the attacker provides (or at least knows) the RMA transition token, from which the RMA seed is derived, I think from a security perspective we would even prefer using the PRNG state, which was seeded with at least boot-quality entropy, right?

Not reseeding the URND PRNG would simplify DD and DV changes, yes.

msfschaffner commented 1 year ago

@andreaskurth I don't think we can assume that the PRNG has been seeded yet, since the bootstrap/RMA entry scenario happens after a reset of the system, when EDN has not been initialized yet. Hence, our options are relatively limited here. We can either use a netlist constant, or something that is device unique (such as the RMA token).

RE effort, do you think there will be a big difference between using a netlist constant, or the rma_seed signal input (which essentially is held stable by the lc_ctrl while the RMA wipe request is active)?

If there is no big difference in effort, should we just go rma_seed? We can discuss whether we should have the life cycle controller output a device-unique value that is different from the rma_seed in the future.

johannheyszl commented 1 year ago

IMO any way we wipe the memory is fine.

Wiping with entropy helps against SCA generally, but in this case an adversary only has one observation for RMA entry (and additionally needs to unscramble and possess the RMA token) which makes it extremely unlikely.

msfschaffner commented 1 year ago

discussed offline with @andreaskurth and @johannheyszl.

It should be ok to just use the current PRNG state for wiping (i.e., without reseeding), because: 1) if the RMA wipe is triggered at runtime, the PRNG state will be random, as it has been seeded already 2) if the RMA wipe is triggered as part of bootstrap, the PRNG state defaults to a netlist constant. however, the scrambling keys of the OTBN memories have just been reset as well - which means that the data in OTBN would read back jumbled via the OTBN bus interface (likely triggering ECC errors). hence the wiping mechanism is a layered defense in this case, and an additional reseed should not be needed.

msfschaffner commented 1 year ago

@moidx @GregAC I think this is an important bugfix that we should absorb for the shuttle. Are you ok we letting @andreaskurth go ahead with the change? Without triggering a PRNG reseed, the added effort should be relatively low (1-2d).

moidx commented 1 year ago

SGTM re proceeding with RTL change. OK from my perspective to wipe without triggering a PRNG reseed given the current OTBN programming guidelines.

GregAC commented 1 year ago

I'm happy with the proposed RTL change here

msfschaffner commented 1 year ago

Thanks, @moidx and @GregAC.

@andreaskurth you're good to go ahead with this change then. Please ping @pamaury and @jwnrt on this thread once the change has landed, so that they can rerun the test and verify that the issue is resolved.

vogelpi commented 1 year ago

Hey guys, sorry for being late for the party. One thing that hasn't been considered in the discussion until now is that the primary reason for wiping OTBN before RMA entry aren't the scrambled memories but the unscrambled register files which don't have a reset (to provide hardening against reset glitch attacks). Are we really fine to wipe the register file with a fixed seed? Once in RMA state (I agree it's not easy to get there), the register files can be read directly through the scan chain.

@johannheyszl , @moidx

johannheyszl commented 1 year ago

Mostly based on very good discussions w/ @vogelpi, @andreaskurth, @msfschaffner:

Important: Secure wipe does actually overwrite WDRs, but not DMEM, it only changes the scrambling keys for DMEM. This is sufficient against most attacks on DMEM after secure wipe (e.g. after reset or context switch etc.) because if reading (before writing) DMEM, data will be still be scrambled from the old key and pass ECC w/o fatal alert only with a probability of 1/2^7 (39b code, 32b data) for every word. Reading multiple words of data, e.g. to recover shared keys will respectively decrease this passing probability (and brute force here is only through live device operations , not offline search). Attacks also need to additionally recover a known plaintext word and brute force the scrambling key (reduced round PRINCE). Altogether this is highly unlikely. In RMA, however, the 32b data words can be retrieved through dbg access even if ECC errors occur due to the old scrambling!

The best attack on potential sensitive values in DMEM (should all be device unique) through RMA would then be:

Attacker needs possession of RMA token (difficult!) and triggers RMA (which triggers a secure wipe).
Attacker uses dbg access to retrieve new DMEM scrambling secret after RMA wipe.
Attacker uses dbg access to read DMEM data. Data in DMEM is still scrambled using the old scrambling key, hence this would trigger an ECC error which would normally trigger a fatal alert but dbg access bypasses this and provides the 32b of data w/o ECC bits to attacker. This data has now been unscrambled using the new key.
Note that the attacker only gets 32b out of 39b words, of the now unscrambled versions (with new key) of data that has been scrambled with the old key. (note the following has not been reviewed yet:)
Attacker then needs to revert the descrambling with the new key. But he will not be able to do this and retrieve the 39b words residing in DMEM correctly since 7b are missing! (Note: Those bits are redundant given a correct 32b word, but at this stage we have "double"-scrambled 39b words.)
For every word, 7b out of 39b, 18% are missing (diffused among the 39b). This makes reverting the descrambling difficult/impossible. (Note: Using the new scrambling key he can revert the CTR more but the diffusion layer has spread the 7 missing bits out.)
Attacker would need to brute force the old scrambling key next, again with the issue that 7 out of 39 bits are missing. In summary, this is a highly complex attack, seems highly unlikely and would require extensive cryptographic analysis to overcome the missing bit issues and to break reduced round PRINCE better than brute-force key search. It also requires the RMA token. (thoughts/comments welcome)

Wiping WDRs:

Since WDRs could be read through dbg in RMA w/o issues (no scrambling), they must be overwritten explicitly.
Any values that are written over the WDRs during secure wipe, effectively prevent read out through dbg access.
Writing predictable values (e.g. zeros or derived from known seed) technically creates a small chance for SCA leakage but RMA entry is a single event reducing this to a single observation, values are at least 39b in parallel, and OTBN processed secrets are device unique (e.g. ECDSA keys). This makes SCA highly unlikely.

johannheyszl commented 1 year ago

@zi-v for vis, who participated in the Sec WG discussion 2022-07-07

moidx commented 1 year ago

My understanding is that the changed being discussed is to be able to execute RMA entry while the ROM is stalling and before the entropy complex has been initialized. At this point in time there are no sensitive assets in OTBN.

Is RMA entry expected to be successful when the device is executing code from flash? Depending on how alerts are configured, this may not be feasible. If this opens a window for attack, then we should have the ROM or ROM_EXT disallow RMA entry before enabling alerts.

There is additional context on this in the following issues:

vogelpi commented 1 year ago

Thanks for the nice write-up @johannheyszl !

Based on our discussions there are two things we think we should do:

Short-term: Modify RMA-entry mechanism inside OTBN to not reseed the PRNG, and use whatever seed is currently available in the PRNG to wipe the WDRs. As Johann pointed out above, even a deterministic seed would be acceptable for this. @andreaskurth volunteered to do this work and will create a separate issue to track this.
Medium-term: Modify DMEM scrambling mechanism inside OTBN to actually overwrite DMEM once the new scrambling key has been loaded. The value written can be anything: PRNG output or even something deterministic. This is not super critical at this moment and we can still discuss this.

moidx commented 1 year ago

Medium-term

We should be careful with performance here. We use WIPE during regular operation and rotation of scrambling keys give us a performant way to modify the state of DMEM. i.e. We want to be able WIPE between otbn invocations.

CC @jadephilipoom

jadephilipoom commented 1 year ago

We should be careful with performance here. We use WIPE during regular operation and rotation of scrambling keys give us a performant way to modify the state of DMEM. i.e. We want to be able WIPE between otbn invocations.

+1; we currently wipe DMEM between each load of an OTBN app, which for cryptolib means every time the caller requests an OTBN operation. Perhaps we could differentiate between the two types of wipes for different purposes?

andreaskurth commented 1 year ago

+1; we currently wipe DMEM between each load of an OTBN app, which for cryptolib means every time the caller requests an OTBN operation. Perhaps we could differentiate between the two types of wipes for different purposes?

Yes, I think our current understanding is that for the performance-critical 'wipe' during operation, changing the scrambling key is sufficient. During PROD LCs, DMEM can only be accessed from Ibex or OTBN. When Ibex tries to read DMEM after re-scrambling, any ECC errors that are likely introduced by scrambling will result in a fatal alert. Once in RMA, DMEM can additionally be accessed by the debug module. When the debug module via SBA tries to read DMEM after rescrambling, it can also access data that has ECC errors. It cannot observe the ECC bits, though, so it only has access to 32/39 bits of data (as @johannheyszl described above). Still, I think it would be better on RMA entry to not only change scramble keys but also overwrite the entire DMEM. This can be deferred post M2.5, though.

vogelpi commented 1 year ago

As discussed in yesterdays meeting, I've now created two separate issues to track the implementation of changes for (1) - required for M2.5, and (2) for after M2.5.

vogelpi commented 1 year ago

Hey @jwnrt , hey @pamaury , we've just merged a PR (see https://github.com/lowRISC/opentitan/pull/18123) which should fix this issue. Can you maybe check whether with latest master the required LC transitions can be done? Thanks!

jwnrt commented 1 year ago

@vogelpi the transition works with the latest RTL change, thanks!

The test now passes with the following changes:

RMA strapping is kept for the duration of the transition, and removed after (before the reset into RMA).
The workaround delay is removed.
The CREATOR_SW_CFG_RMA_SPIN_CYCLES setting is increased for the sw/device/silicon_creator/rom/e2e/BUILD configuration (10 cycles is not enough), as @pamaury has already mentioned.

I will submit a PR with these changes to the test. Thanks all for debugging and getting this fixed!

vogelpi commented 1 year ago

Perfect, thanks for the feedback @jwnrt ! Please feel free to close this issue once your PR is merged and the test is working.

msfschaffner commented 1 year ago

Awesome, thanks everyone for the good work!

lowRISC / opentitan

[lc_ctrl] Lifecycle controller not ready for transitions until after some delay #17944

Summary

Reproducing steps