riscv / riscv-crypto

RISC-V cryptography extensions standardisation work.
https://wiki.riscv.org/x/MVcF
Creative Commons Attribution 4.0 International
369 stars 88 forks source link

trng extension covers very narrow range of applications #84

Closed jnk0le closed 3 years ago

jnk0le commented 3 years ago

Reading current spec gave me impression that trng design is very x86 like magical being that "just is through the space, time and universe". It's tightly coupled to the core, always enabled, is fast enough to not require scheduler intervention, can't be reset/power cycled to recover from dead state (just freeze or throw an BSOD and wait until user restarts) and of course the only medium to talko to it, is our ISE.

The main ISA-level interface consists of a single pseudoinstruction, pollentropy that returns a 32/64-bit value in a CPU register. It is invoked in Machine Mode (which may be the only mode) as follows

We specifically recommend against busy-loop polling on this instruction as it may have relatively low bandwidth. Even though no specific interrupt sequence is specified, it is required that the wfi (wait for interrupt) instruction is available. Cores which implement mentropy must not raise an Illegal Instruction Exception when executing wfi unless required to do so by the Timeout Wait bit of the mstatus register, as detailed in Section 3.1.6.5 of the Privileged ISA Manual

I'm not sure how that wfi polling in M mode is supposed to work. Especially considering the fact that the specification doesn't define any M mode "rng ready" interrupt signal. Waiting for a random M mode interrupts is going to make polling time very non deterministic (BTW, isn't typical unix-like OS doing all of its periodic and peripheral interrupts in S mode only?).

About actual use cases in microcontrollers. Let's consider TRNG like the one in stm32h7 [1] (claimed to be "AIS-31 tested")

Their design is of course an dedicated memory mapped peripheral as that's the only thing possible with ARM. But speculating what could change when they ported to RISC-V, that would be the addition of "bypass to rvk" configuration option and an extra fifo to the core.

The peripheral itself can take care of the err/rdy interrupts, conditioning and low level configuration.

There is significant problem with spec state machine when covering the bypassed peripheral

The core start first. Does it fall into infinite BIST or a Schrödinger state machine (DEAD if read now)?

Power saving. Does it fall into infinite BIST, instant DEAD or a Schrödinger state machine?

4 sources and the designated clock needs time to startup. Does it fall into infinite BIST, or a Schrödinger state machine?

Power saving. HSI48 sucks power, LSE and LSI triggers clock error detection, and HSE disciplined PLL might be a trap (as well as LSE). Does it fall into infinite BIST, instant DEAD or a Schrödinger state machine?

Here goes the actual BIST, WAIT and ES16

It's an officially mentioned (in 34.3.8) way of reducing power usage. Does it fall into infinite BIST, instant DEAD or a Schrödinger state machine?

The same thing can happen to the rvk bypass fifo. Probably a few ES16 and then what? Fall into infinite BIST, instant DEAD or a Schrödinger state machine?

About non recoverable DEAD state

Is the non recoverable DEAD state some legal requirement?

Implementations do not need to implement DEAD as it may not require an end-user notification;

That sounds like a weird freeze (that's a best case) when a thread polls for a random data that never comes. A corner case that's going to waste a 'few' man hours to debug. And even if it's known there is no clear way of knowing when to consider a fatal error.

There are definitely fatal enough cases where DEAD state is applicable, but can return to normal operation after intervention.

STM's trng actually requires explicit actions to recover from seed errors (34.3.7, flag clear + dumping of the data).

an immediate lock-down may be a more appropriate response in dedicated security devices.

I think that sec-lockdowns should be at least configurable or/and have another SEC-DEAD state. Especialy since "manufacturer that know better" situations are so common. There are many applications that can't simply lock-down because of minor statistical issue.

About poll noise.

Is it really mandatory to have a standard but yet "custom" csr to read "raw noise" on every single core?

Isn't it supposed to be purely custom thing or even available only through memory mapped peripheral registers (undocumented registers or fuse blown after verification)?

[1] https://www.st.com/resource/en/reference_manual/dm00314099-stm32h742-stm32h743753-and-stm32h750-value-line-advanced-armbased-32bit-mcus-stmicroelectronics.pdf

ben-marshall commented 3 years ago

Hi @jnk0le Thanks for all of your questions! I'll try and answer them, though @mjosaarinen might be able to give you better answers too.

Reading current spec gave me impression that trng design

To be extremely clear, we have not designed a TRNG. There is nothing in the specification about how a manufacturer should generate entropy.

The design is an interface to an entropy source. This is important, because it means a lot of your questions (I think, if I understand them correctly) are actually for the manufacturer to decide upon.

For the sake of my answers, I'll try and talk about three things:

It's [...], always enabled, is fast enough to not require scheduler intervention, can't be reset/power cycled to recover from dead state (just freeze or throw an BSOD and wait until user restarts) and of course the only medium to talko to it, is our ISE.

The ES is always available and never blocks, yes. But that does not mean that the TRNG is always on from a power/energy point of view. Also, it's hard to know what you mean by "fast". The interface is non-blocking, but that doesn't mean that you'll get a very large amount of continually available entropy from it. It just means the instruction will return immediately, with likely either a WAIT or ES16 status code.

The expectation is that the TRNG tries to keep a buffer of entropy samples full. When the buffer is full, the TRNG can be "turned off" (whatever that means for the implementation) to save energy. When the buffer is sampled, the TRNG wakes up again and tries to keep it full.

I'm not sure how that wfi polling in M mode is supposed to work. Especially considering the fact that the specification doesn't define any M mode "rng ready" interrupt signal. Waiting for a random M mode interrupts is going to make polling time very non deterministic (BTW, isn't typical unix-like OS doing all of its periodic and peripheral interrupts in S mode only?).

The expectation is that the ES would be sampled by software periodically, as part of a regular / periodic interrupt routine, and the sample (if it was valid) would then be added to a software managed entropy pool. Hence, we don't define an M-mode interrupt.

About unix-like OS's doing things like periodic interrupts in S-mode only, that's something we can to look into for sure. Markku has talked to some of the Linux random number interface folks, but he can tell you about that better than I can.

About all of your questions about a concrete implementation. I can't answer them all, and they are all very valid questions, but I don't think any of them are insurmountable? They are almost all within the remit of the manufacturer to solve. I really like that you walked through the STM documentation and tried to map the behaviour onto our specification, I think that's a great exercise.

Remember, there can be certain amount of extra vendor / device specific management of the TRNG hardware outside of our defined ES interface. This might include some early post-power-on code to turn on the peripheral clock via a memory mapped interface etc. Just dropping in this AHB peripheral and expecting it to work with this new interface is not going to work. I'd expect the peripheral to have some "always on" section which can dynamically turn itself off and on depending on the need to generate more samples or not. So, I don't yet see a problem with reset behaviour and the state machine?

Does it fall into infinite BIST, instant DEAD or a Schrödinger state machine?

All of these questions are really for the implementer. Generally though, our expectation is that "there's no coming back from the DEAD". If you know your TRNG hardware is going to recover at some point in the future, that should be a BIST or WAIT state. Exactly which will depend on the implementation, but I'd expect that post power-cycle, you would have to enter BIST.

Is the non recoverable DEAD state some legal requirement?

Yes, the interface needs to be able to report an unrecoverable error. Generally speaking, the ES interface is designed to fit with FIPS/AIS certification schemes, meaning that it must be able to report all of the information those standards require.

Implementations do not need to implement DEAD as it may not require an end-user notification;

That sounds like a weird freeze (that's a best case) when a thread polls for a random data that never comes. A corner case > that's going to waste a 'few' man hours to debug. And even if it's known there is no clear way of knowing when to consider a fatal error.

I agree. And unfortunately, we can't rule out people implementing this badly. This might be bettered by changing the language in the spec?

There are definitely fatal enough cases where DEAD state is applicable, but can return to normal operation after intervention.

We're getting into semantics now, but I would not call a case fatal if it is recoverable. In our spec, fatal is a one way trip. For these cases, WAIT or BIST would be more sensible indicators.

I think that sec-lockdowns should be at least configurable or/and have another SEC-DEAD state. Especialy since "manufacturer that know better" situations are so common. There are many applications that can't simply lock-down because of minor statistical issue.

Being able to indicate a security lockdown might be useful. The DEAD state should not be used to indicate minor (read, frequently occurring and recoverable) statistical issues. If a TRNG needs to transition between BIST/WAIT/ES16 because of the occasional statistical error, that is handled in the current spec.

Is it really mandatory to have a standard but yet "custom" csr to read "raw noise" on every single core? Isn't it supposed to be purely custom thing or even available only through memory mapped peripheral registers (undocumented registers or fuse blown after verification)?

The rationale is that everyone was going to need this interface in one form or another, so we should standardise as much of it as possible to ensure it is implemented safely (i.e. always report BIST when in noise test mode) but still give people enough flexibility. Otherwise, everyone has a different implementation, and the likelihood of a security critical mistake increases. It may be that this interface is disabled after manufacturing by some, yes.

It would be reasonable to argue this is too much a "manufacturer specific thing" to try and standardise at all. We went the other way, especially because you must have this sort of interface for FIPS/AIS certification.

Ultimately, the ES interface allows you to express the behaviour of a terrible TRNG implementation. This is necessary, because this might reflect a good TRNG implementation which has been damaged or attacked. However, just because this behaviour is expressible, doesn't mean it is likely or the intended average case behaviour.

Whew. Okay, I hope that brings some clarity? Please do keep asking, it's good for everyone involved.

Thanks, Ben

ben-marshall commented 3 years ago

Also - " trng extension covers very narrow range of applications"

This is very much by design. We didn't want to get into the situation RDRAND is in, where people now expect lots and lots of random bits very cheaply for non-cryptographic purposes. Hence this proposal is an Entropy Source for cryptography, not a random number source for anything else.

mjosaarinen commented 3 years ago

I'll close this.

SUMMARY: "trng extension covers very narrow range of applications" -- this is the entropy source CSR in the cryptographic extension, and is not intended for any other application, nor to provide full TRNG functionality.

We've already worked through the kernel interface issues and If there is a need for a non-fips non-cryptographic random, it can be proposed elsewhere. Furthermore, if a vendor wants to introduce a security vulnerability where one can cheat the FIPS online statistical tests by doing them again after a non-recoverable failure (DEAD), the vendor is on her own as that will no longer pass FIPS certification.

As noted, if a vendor wants to implement these or other security vulnerabilities, or introduce power problems rather than solve them, that's not really our problem. DEAD is the result of a non-recoverable failure, not a recoverable failure. Implementors are of course free to tightly couple it, or arrange CSR access in a looser way. I can tell you that the reference implementation is not tightly coupled, nor is it "always on" as claimed by the commentator. The ISA does not require these things and gives the implementor a significant degree of freedom in this regard.

jnk0le commented 3 years ago

"trng design" was a bit misworded, right.

I agree that some transitions can be automatized (and things handled) in dedicated IP but the main point still is that the spec state machine severely lacks an "you are not going to get anything until taking {implementation specific} actions" state where that implementation specific action might be an initialization of the peripheral (e.g. rare race condition bug might leave some clocks/peripherals unititialized).

It can be also a catch all, for an:

Implementations do not need to implement DEAD as it may not require an end-user notification;

That allows for a quick detection and logging of the "this should never happen" situations.

Otherwise what is the alternative for defensive programming (except peripheral specific interface)?

jnk0le commented 3 years ago

more abount that M mode only polling

If that's supposed to trap (and reexecute) from periodic timer interrupts in lower modes, then what if there are trust issues with M mode (bios,uefi, sbi or whatever it's called on riscv) code?

It can just simply not reexecute pollentropy (csrr) instruction but instead call something like this:

uint_XLEN_t pollentropy_emulation() {
   srand(time(NULL)); 
   return ((rand() & 0xffff) | (0b01 << 30))); // what could possibly go wrong
}

or this:

uint_XLEN_t pollentropy_emulation() {
   srand(time(NULL)); 
   return ((actual_pollentropy() & 0x4000000f) ^ (rand() & 0xffff))); // what could possibly go wrong
}

or even this:

uint_XLEN_t pollentropy_emulation() {
   srand(time(NULL)); 
   return (((actual_pollentropy() & 0xffff) ^ (rand() & 0xffff)) | (0b01 << 30))); // what could possibly go wrong
   // WAIT,BIST,DEAD states are ignored
}

of course, left in release binary "by mistake".

And if the bios/uefi/sbi/whatever-proprietary-blob is solely responsible for periodical polling and managing entropy buffers, then situation gets even worser.

Also trapping from U/S to M mode on every pollentropy call, would cause quite significant performance/power penalty.