[entropy_src] Make sure unaltered raw data can be extracted for FIPS randomness validation

johannheyszl commented 9 months ago

Description

Our current understanding is that no health tests will alter the data that is accumulated in the observe FIFO. FIPS validation testing, we can use the Firmware Override / Bypass Mode to read it (see Programmer's Guide - OpenTitan Documentation). An interrupt is generated on reaching a predefined threshold in the FIFO.

Issue created after discussion b/w: @moidx @johannheyszl VadimS @vogelpi @zi-v @h-filali

johannheyszl commented 9 months ago

cc @msfschaffner @andreaskurth

vogelpi commented 8 months ago

@johannheyszl am I correct in assuming that this is mainly a block-level DV tasks? I.e., is it sufficient to check that the entropy arriving in the observe FIFO is indeed what is received at the input?

johannheyszl commented 8 months ago

@vogelpi yes it seems sufficient to check this. If not covered by DV already, might make sense to add a specific test for it to make sure since it is very important.

vogelpi commented 7 months ago

I wanted to give a quick update on this @johannheyszl . While @h-filali has been working towards solving this issue, he noticed that some FIFOs aren't connected to alerts that fire if the FIFO is pushed when full already (while other FIFOs have such alerts). Adding some SVAs specifically for this, seemed to indicate that the postht FIFO, the precon FIFO and the bypass FIFO can be pushed when they're actually full, so entropy can get lost. The good news is that this doesn't affect the firmware override modes important for validation.

As for pushing full FIFOs, I am currently thinking that there is still a misunderstanding here and those FIFOs aren't really pushed. Because there are quite some SVAs by @andreaskurth in the design ensuring that no entropy is systematically dropped where it shouldn't be dropped. Now that we no longer disable the noise source if the pipeline gets full (wasn't compliant with the spec) it seems it can happen that entropy is dropped at the very input. This is what needs to be analyzed for M2 as it we might need to add another FIFO to buffer entropy when the conditioner is busy. (see #21686)

I think the DV task this PR is about can be done in M3 instead.

zi-v commented 7 months ago

Regarding "it seems it can happen that entropy is dropped at the very input ... need to add another FIFO ...": I am not sure about the context, but if this is to support continuous health test requirement, than all you need to do is move the overflow samples drop point to after the health test. Note that the entropy rate from analog RNG is slower by orders of magnitude compared to the core clock so it shouldn't be a challenge to keep health test serving the incoming analog samples.

vogelpi commented 7 months ago

Thanks for your feedback @zi-v . This makes sense. I am now checking whether the back pressure originating from the conditioner is handled correctly.

We do have a test for this where the noise source is running at unrealistically high rate (basically max) but I think the simulation environment was currently too optimistic in terms of the conditioner back pressure (the CS AES halt interface can stall the conditioner for up to 48) but I am not sure if we were ever seeing this. I would expect the dropping to happen at the wrong place now and some SVAs failing if the conditioner is stalled for 48 cycles and the noise source pushing in almost 200 bits during that time. I agree that we don't need to be able to absorb this amount of entropy due to the unrealistic noise source rate. But we need to make sure the entropy is dropped at the correct place, as you say after the health tests.

vogelpi commented 7 months ago

After quite some work I've now fixed the CSRNG AES halt interface on the entropy source side (this was known to be partially broken but it's the main source of back pressure for the precon FIFO). I am now able stimulate cases with the noise source running at very high rate where the precon FIFO stalls the entire pipeline. So the actual debugging can begin :-)

Regarding dropping entropy bits after the postht FIFO but before the precon FIFO, @h-filali pointed out that this may not be compliant with 90B. In Section 3.1.5 it says:

The size of the input and the output of the conditioning component in bits, denoted as nin and nout, respectively, shall be fixed and shall be specified by the submitter.

This means if we drop some entropy bits before the conditioner, we must actually drop it before writing them into the postht FIFO but still accumulate health test results. So the health test window actually becomes slightly longer. The dropped bits should not be counted to determine when to trigger the conditioner.

johannheyszl commented 7 months ago

Thanks. This is exactly what I meant yesterday. drop after healt tests and before conditioning but still use the same number for conditioning :)

vogelpi commented 7 months ago

I've now a PR to change the drop point and keep the number of samples going into the conditioner constant here https://github.com/lowRISC/opentitan/pull/21799. This PR also adds a 32-bit, 2-deep FIFO to absorb the backpressure. This is needed to keep the verification going right now. Because the DV environment doesn't accurately model the internal of the pipeline (it's very hard), so dropping samples is not verifiable atm.

vogelpi commented 7 months ago

Update: @h-filali has been working on a PR addressing the original issue here: we need to make sure the entropy read out via observe FIFO is really what is received by ENTROPY_SRC on the RNG interface. This work is in PR https://github.com/lowRISC/opentitan/pull/21821.

vogelpi commented 7 months ago

@h-filali 's PR has been merged and we could successfully verify that unaltered raw entropy is read from the observe FIFO. Also my PR adding the FIFO to handle backpressure from the conditioner. I am closing this.

vsukhoml commented 6 months ago

if entropy source continue to work all the time, then it will affect power consumption. what is the problem with disabling it when FIFO is full?

vogelpi commented 6 months ago

The problem with disabling when the FIFO is full is that this will also shutdown the noise source. This requires repeating the startup health tests again. In the previous version of the design (and the engineering sample) software would not get informed about this and the hardware wouldn't repeat the tests. So it wasn't compliant with the spec.

I am in the process of reworking and cleaning up the documentation of ENTROPY_SRC to clarify things like this. I'll ping you in the PR.

johannheyszl commented 6 months ago

I think one can still shut down the entropy complex for power saving.

But as @vogelpi said, to maintain FIPS compliance we need to either let run or redo the startup procedure health testing

vsukhoml commented 6 months ago

This is a good point to study. I need to double check what are the shades of disabling entropy source. Can its output be just ignored? I suppose it shall always go through health checks though, but can be dropped after tests. My concern is power consumption - may be if we can have only health checks running? Can we disable some health checks temporary?

vsukhoml commented 6 months ago

NIST 800-90B states: "Start-up health tests are designed to be performed after powering up, or rebooting, and before the first use of the entropy source" However, entropy source is the complete assembly of physical source + health checks and it can be hybrid hw/sw. I suppose internal implementation details doesn't matter as entropy source as a whole is not powered down - just some part of it.

"The documentation shall include a description of the health tests, source code, the rate and conditions under which each health test is performed (e.g., at power-up, continuously, or on-demand), and include rationale indicating why each test is believed to be appropriate for detecting one or more failures in the noise source."

FIPS 140-3 IG, page 44 covers some cases.

"The module is passively receiving the entropy while exercising no control over the amount or the quality of the obtained entropy. Examples include: (a) A hardware module with an approved DRBG inside the module’s cryptographic boundary. The approved DRBG is either seeded via a seed loader from outside the module’s cryptographic boundary or the seed is pre-loaded at factory. What is required: (i) the SP shall state the minimum number of bits of entropy believed to have been loaded and justify the stated amount (from the length of the entropy field and from any other factors known to the vendor), (ii) the following caveat shall be added to the module’s certificate: No assurance of the minimum strength of generated SSPs (e.g., keys). ..."

So, it seems it is possible to handle it, just need do proper justification.

johannheyszl commented 6 months ago

@vsukhoml thanks. Let me try summarize the situation:

In the start-up phase of the physical noise source the initial number of samples is health-checked.
If we shut down the physical noise source, we need to repeat this startup testing.
If we keep the physical noise source active and want to use samples, we need to keep the health test running.
You would like to save power in the entropy complex generally.
I am guessing, that if the pipeline is full, we currently drop samples after health checking before conditioning. This reduces activity=power in the complex. (@vogelpi correct me if wrong)

I am guessing, if we want to further reduce power consumption, we simply stop the physical noise source (and by this all activity in the entropy complex)) and restart when needed, incurring a little time penalty for the start up testing, when its turned on again.

@vogelpi correct me if wrong please. @vsukhoml Does this sound good to you?

vogelpi commented 6 months ago

I am guessing, that if the pipeline is full, we currently drop samples after health checking before conditioning. This reduces activity=power in the complex. (@vogelpi correct me if wrong)

Not exactly. Please read the updated documentation here: this should answer all your questions regarding entropy dropping.

I am guessing, if we want to further reduce power consumption, we simply stop the physical noise source (and by this all activity in the entropy complex)) and restart when needed, incurring a little time penalty for the start up testing, when its turned on again.

@vogelpi correct me if wrong please. @vsukhoml Does this sound good to you?

It has been confirmed by @jettr in another thread that by disabling the ENTROPY_SRC by writing the ENABLE field in the CONF register to False has a big impact on idle power. This is because it stops the entire hardware pipeline inside ENTROPY_SRC AND it disables the analog noise source inside AST. You are correct though that to get back online, there is a penalty to pay due to repeating the startup tests and filling the final FIFO (esfinal FIFO) again with seeds.

johannheyszl commented 6 months ago

Thanks @vogelpi so if we need to save power, we can disable the entire complex. Is this good for you @vsukhoml ?

vogelpi commented 6 months ago

I believe it should work to just disable and re-enable the ENTROPY_SRC not the full complex. It should work to keep CSRNG and the EDNs running and configured. This should be confirmed experimentally on the chip by @vsukhoml and/or @jettr please.

vsukhoml commented 6 months ago

Yes, the workaround is always to use live entropy from entropy source to seed SW DRBG, so entropy source can be disabled after it. CSRNG will be in the software driven mode. We can enable entropy source closer to reseed interval of SW DRBG to give it a time to collect samples, and disable after reseeding.

What would be appropriate test for EDN behavior with disabled entropy source? It requests data from CSRNG, and CSRNG once seeded with live entropy can continue to provide data to EDN, if I understand correctly. The reseed commands will be issued once every MAX_NUM_REQS_BETWEEN_RESEEDS generate requests. The reset value is 0. This implies that no generate commands will be issued unless this value is changed by firmware., so with default configuration EDN won't request any reseeds.

I'm more interested in expected behavior of CSRNG when reseed command or instantiate command are requested with an entropy source, but it is disabled. Will it be silent seeding with some default value? error (what specific error? I don't see what code would match it in ERR_CODE) no ack for the command (SW_CMD_STS . CMD_RDY?

vogelpi commented 6 months ago

As for the suitable test, you could start from the Firmware Override: Extract & Insert mode test. This sets up the EDNs in a mode where they request a lot of entropy. I.e., CSRNG will need to reseed often.

About MAX_NUM_REQS_BETWEEN_RESEEDS: I think firmware should always write this. I am not sure anymore if with 0 not even the first generate request works.

About the expected behavior: there is a req/ack handshake protocol between ENTROPY_SRC and CSRNG. The expected behavior is that if ENTROPY_SRC is disabled, it won't ack the req of CSRNG and CSRNG will simply be stalled. CSRNG can also trigger an interrupt when it sets the request signal. Please check the CSRNG documentation here.

Between CSRNG and EDN there are also req/ack interfaces but the contexts inside CSRNG obviously have state. If CSRNG is disabled without disabling EDN, the next EDN req should be an Instantiate request. Anything else doesn't make sense. That's why we require both to be disabled/re-enabled in sync. Back in the days, we simply extended the guidance to ENTROPY_SRC to be on the safe side, but I believe it's not strictly required anymore.

vogelpi commented 5 months ago

Closing as all questions seem to have been answered. If not, feel free to re-open.

lowRISC / opentitan

[entropy_src] Make sure unaltered raw data can be extracted for FIPS randomness validation #20953

Description