[spi_device] TPM return-by-hw flawed without hardware write support

a-will commented 8 months ago

Description

The return-by-hw mechanism for TPM registers is missing hardware handling of writes, and it appears that makes it difficult to comply with the spec's timing for subsequent reads. With the SPI TPM spec, various FIFO interface registers require the ability to read the current value with only a single wait state, with the most critical register being TPM_ACCESS_x.

However, to process a write on time, a multi-tasking Firmware will need to receive an interrupt, process the command, and update the register in substantially less than 2 us (at 24 MHz SPI). The spec even calls out that the response to a write to TPM_ACCESS_x to request access should appear to host Software to happen "practically immediately." A write, followed by an immediate read of TPM_ACCESS_x is likely a very typical sequence, since this is needed to proceed with any TPM commands.

a-will commented 8 months ago

CC @moidx @jesultra @mazurek-michal @msfschaffner

a-will commented 8 months ago

For some perspective on the time 2 uS gets us, note that it's just 160-200 CPU cycles on earlgrey, depending on the clocking mode. Because spi_device is positioned down on xbar_peri, with its core clock on io_clk_div4, each read over TL-UL is going to take something like 20 CPU cycles.

jesultra commented 8 months ago

At ChromeOS, our previous generations of chip have the same limitation, and we do not have interrupt handlers fast enough. That is why ChromeOS devices uses a non-standard "ready" signal from the TPM (GSC), telling the application processor when a write has been processed, and the AP will not issue any reads or other subsequent TPM commands until it sees that signal.

What I am saying, is that the fact that OpenTitan cannot meet the TPM specification in this regard is not a problem for ChromeOS (though it would be nice to eventually get rid of the non-standard signal.)

a-will commented 7 months ago

I've marked this for a future release, since the feedback has suggested that the mode works for its intended use case and timelines are tight. We probably ought not to call the current function a "TPM mode", since it cannot meet the standard's specifications.

Please raise issues or adjust the priority level here if I have erred.

msfschaffner commented 7 months ago

@moidx @timothytrippel @cfrantz @jesultra FYI

jesultra commented 6 months ago

@a-will, I came to think. Even though ChromeOS has our separate "ready" signal, which we currently use to delay issuing any register read transaction until the GSC has had time to finish processing the previous register write, it would be nice if the OpenTitan chip could support other non-standard means of waiting, which did not require a separate signal.

Even though the TPM standard specifies at most one wait state for reading certain registers, the most straightforward non-standard way of allowing GSC firmware to take some time processing a write, would be that subsequent register reads would be delayed in the wait state, until the write was fully processed.

I would like to ask if the current design of the return-by-hw mechanism can be configured such that any register write causes immediate "invalidation" of all the pre-loaded values for registers, which would cause any subsequent register read to stay in wait state indefinitely, and then once firmware gets around to process the write operation, it has a chance to update the pre-loaded register values before flipping the bit to say that return-by-hw is again "ready".

I would expect that the above is not super tricky to realize in hardware, and it could allow ChromeOS to drop the non-standard ready signal (as long as ChromeOS is prepared to wait for the value of registers, even when the standard requires them to be immediately readable.) Such a protocol would make it easier to later switch to a fully compliant TPM, as the ChromeOS AP drivers basically would not need to change.

a-will commented 6 months ago

@jesultra I feel sympathetic, but features on spi_device are frozen for the coming release (as of the M2 milestone date, which was already later than planned). It's unlikely we'd scramble for resources and turn this non-critical feature request into a P0 now.

If having additional wait states is acceptable, likely the best that can be done for the next release is not enabling return-by-hw in the GSC firmware. Performance may be lower for reads that don't follow writes, but that's likely all we can support right now.

a-pronin commented 6 months ago

Some comments:

I agree that it might further simplify the firmware logic (and will avoid some corner cases, when firmware is stalled while a flash op is performed for hundreds of milliseconds before it can react to TPM requests) if we also add handling writes for non-FIFO TPM registers to SPI controller hw. Note that each register will need to have an register-specific mask of which bits are actually settable and the operation is "set-bit-on-write-if-only-one-allowed-bit-is-set-in-write request", not "overwrite the previous contents with what's sent from host" - but I'm sure that part is understood. But 1) worth double-checking that this logic always holds for all registers in all states (I didn't check so far, no reason since we never had hw writes support) 2) while it will improve things, the current state doesn't seem to violate the spec though. more details below.

With the SPI TPM spec, various FIFO interface registers require the ability to read the current value with only a single wait state, with the most critical register being TPM_ACCESS_x. However, to process a write on time, a multi-tasking Firmware will need to receive an interrupt, process the command, and update the register in substantially less than 2 us (at 24 MHz SPI).

The spec only requires that the current value is read with only a single wait state. The TPM-side firmware can (and does) spend much longer than a single wait state to actually process the request and reflect the state in the register.

Though the spec does seem to define all writeable bits in a way that processing them in hardware shouldn't break the state machine (I didn't check all bits, but that's true for most commonly used bits at east), it doesn't seem to require anywhere that a write to STS.commandReady or ACCESS.requestUse is immediately reflected in the register value that host software reads out immediately after. you can write commandReady=1 is see it as commandReady=0 on the next poll while waiting for Expect + burstCount.

For writes to ACCESS register, here's what spec requires:

After the request of a locality to become the active locality, TPM_ACCESS_x.activeLocality will be a 1 within
TIMEOUT_A if there is no other locality active at this time, otherwise TPM_ACCESS_x.activeLocality will be a 1 only
after the other locality has relinquished control of the TPM.

Where TIMEOUT_A/B/C used in similar situations for ACCESS and STS all have defaults in hundreds of milliseconds. And nothing about how quickly requestUse should be seen as 1 on read after writing it.

So, the following sequence doesn't seem to contradict the spec.

initial state:
- hwreg for TPM_ACCESS_X has activeLocality = 1, requestUse = 0,
- hwreg for TPM_ACCESS_Y has activeLocality = 0, requestUse = 0,
software Y requests locality Y by writing TPM_ACCESS_Y.requestUse = 1
at this point, if software Y manages to quickly read TPM_ACCESS_Y before TPM firmware got a chance to run, it sees activeLocality = 0, requestUse = 0 -- this is legal, no expectation of a write being reflected immediately, the only timeout is TIMEOUT_A and it is only applicable to activeLocality
TPM firmware sets TPM_ACCESS_Y.requestUse = 1
time passes
software X requests to release locality X by writing TPM_ACCESS_x.activeLocality = 1
firmware gets the request
firmware sets TPM_ACCESS_X.activeLocality = 0 in hwreg
at this point if host software reads TPM_ACCESS_X or TPM_ACCESS_Y, it will be auto-served by SPI controller hw and both will have activeLocality = 0
firmware sets TPM_ACCESS_Y.activeLocality = 1 requestUse = 0 in hwreg
after this point software Y that sits polling on TPM_ACCESS_Y will see TPM_ACCESS_Y.activeLocality = 1 auto-served by SPI controller and can continue with its command.

The spec even calls out that the response to a write to TPM_ACCESS_x to request access should appear to host Software to happen "practically immediately." A write, followed by an immediate read of TPM_ACCESS_x is likely a very typical sequence, since this is needed to proceed with any TPM commands.

1) Host software typically (at least in the case of Linux tpm driver, which I know) optimizes out most of ACCESS writes. A driver grabs the locality when it starts, and doesn't release it. On every command it checks if this locality is still active and only requests to activate it if somebody else grabbed a different locality since the last command. I can't speak for all systems, but in our case that means that in practice the locality is only requested once between reboots.

2) When the spec talks about practically immediately for ACCESS it talks about the delay between releasing one locality and granting a different one - the software shouldn't expect to be able to catch the state when there's no active locality when polling the ACCESS register after it released the current locality, from its point of view it will most likely go through locality X active to locality Y active (without "no locality is active" state ever being observable through TPM registers in between). But the spec doesn't say that locality Y becomes active practically immediately after the software requests it (even if there was no current active locality during the time of request) - for that only TIMEOUT_A is expected. Here's the wording about "practically immediately":

Software needs to consider that there is no timeout condition defined for the time period between the release of one
locality and when access to a subsequent locality is granted (i.e. TPM_ACCESS_x.activeLocality is set to 1), as this
process happens practically immediately from a Software point of view.

a-pronin commented 6 months ago

We probably ought not to call the current function a "TPM mode", since it cannot meet the standard's specifications.

As commented above, I believe it does.

moidx commented 6 months ago

@jettr to triage this issue wrt integration requirements.

High estimate: low/mid area, high complexity.

jettr commented 6 months ago

I updated the priority of this to P3. We do not need this feature for 2024. This is a future feature improvement request.

a-will commented 4 months ago

software Y requests locality Y by writing TPM_ACCESS_Y.requestUse = 1

at this point, if software Y manages to quickly read TPM_ACCESS_Y before TPM firmware got a chance to run, it sees activeLocality = 0, requestUse = 0 -- this is legal, no expectation of a write being reflected immediately, the only timeout is TIMEOUT_A and it is only applicable to activeLocality

Ah, but there is a catch. TPM_ACCESS_Y.tpmRegValidSts must already be set to 0 in that case. Otherwise, I believe it is not legal. Since we currently require firmware to change that value in response to writes, we end up with a timing problem.

Interestingly, it's possible having hardware reset that bit on a write (and using it to induce wait states on reads) is all that is necessary to be in compliance. I'd have to look closer at how all of the return-by-hw register fields work to be certain.

lowRISC / opentitan

[spi_device] TPM return-by-hw flawed without hardware write support #21160

Description