lowRISC / opentitan

OpenTitan: Open source silicon root of trust
https://www.opentitan.org
Apache License 2.0
2.55k stars 759 forks source link

[hmac] `HMAC_CMD_HASH_STOP_BIT` and `HMAC_CMD_HASH_PROCESS` commands causing hang conditions #24767

Open moidx opened 1 day ago

moidx commented 1 day ago

Description

Certain delays between last message FIFO write and HMAC_CMD_HASH_STOP_BIT and HMAC_CMD_HASH_PROCESS commands cause the HMAC.STATUS register to get stuck with hmac_idle bit cleared.

Reproduction steps (pseudo-code):

 status = msg_fifo_write(data, len - leftover_len);

 // delay here, even  this small is enough
 for (size_t i = 0; i < 10; i = launder32(i + 1))
    ;

 // Time to tell HMAC HWIP to stop, because we do not have enough message
 // bytes for another round.
 uint32_t cmd_reg =
     bitfield_bit32_write(HMAC_CMD_REG_RESVAL, HMAC_CMD_HASH_STOP_BIT, 1);
 abs_mmio_write32(kHmacBaseAddr + HMAC_CMD_REG_OFFSET, cmd_reg);

 // Wait for HMAC HWIP operation to be completed.
 // In the error condition, this function will block forever.
 status = hmac_idle_wait();

During regular operating conditions, interrupts can fire at any time, causing delays similar to what is captured in the reproduction steps. This may cause hangs in the field that will be very difficult to debug / root cause.

Issue reported by @vsukhoml. CC: @vogelpi, @martin-velay, @johannheyszl, @gdessouky

vsukhoml commented 1 day ago

I'd say 10 iterations for the loop works with a little more complex code, but I suppose closer to 30-40 would definitely cause this. Easiest test - run SHA256 with 64+ message, so CMD.HASH_STOP would be used before final steps to save context.

vogelpi commented 1 day ago

Thanks for reporting this @vsukhoml , and thanks for filing the issue @moidx .

Is it possible to workaround the issue by checking the FIFO empty status bit before triggering the stop? Alternatively, can we check the idle status bit before triggering the stop?

It would be good to understand what the status of the FIFO and the overall IP is when the issue occurs and when not. Emptying the FIFO takes just 16 clock cycles, and computing a block in order of 48 to 80 clock cycles. With 30 - 40 loop iterations I would expect that we signal the stop when the hardware is already idle.

vogelpi commented 1 day ago

Update: I could reliably reproduce this on the FPGA using this command:

bazel test --test_output=streamed --cache_test_results=no //sw/device/tests/crypto:hmac_multistream_functest_fpga_cw310_sival_rom_ext

With 40 loop iterations, the test reliably hangs when streaming the first segment. Printing the status register before signaling the stop reveals that the FIFO is always empty, independent whether the test fails or passes. All other status bits/fields are 0. Also the ERR_CODE and MSG_LENGTH_LOWER registers don't provide useful insight.

I will now collect some waves with the following command and report back tomorrow:

util/dvsim/dvsim.py hw/top_earlgrey/dv/chip_sim_cfg.hjson -i chip_sw_hmac_multistream -t vcs --purge --reseed 1 --waves vpd
vsukhoml commented 23 hours ago

I checked that if I'm just making a delay after HMAC_CMD_HASH_STOP_BIT, I can read a non-zero digest, which seems to be the state I'm looking for, but I can't continue - HMAC never reports INTR_STATE.hmac_done, so logic for next commands fails. Status can be either 2 or 3 though. I also tried to send hash_stop multiple times - it didn't affect behavior with no delay, but didn't help for the case with a delay. I suspect there is 96 cycle window between last byte/word of the block sent to FIFO and when hash_stop can be sent for regular operations. If hash_stop is delayed - it stuck in the state with no INTR_STATE.hmac_done.

I tried to "reset" HMAC by sending hash_process command after hash_stop and by sending invalid command with all 4 commands, but it didn't help, though STATUS.idle bit became set. For invalid command I got ERR_CODE=2, and couldn't reset this value, so stuck with status=00000003, err_code=00000002, intr_state=00000000.

So, besides finding a workaround other than using SW or OTBN for SHA2 with context switching, it would be nice to have means to bring HMAC into operational state after software errors like sending invalid command.

vsukhoml commented 22 hours ago

While it is not a good use case, but I get to the same state with sequence of hash_start and hash_stop commands with zero length message in between. just one after another. status=2, intr_state=0. I'd say HMAC HW shall not hang on sequences like this.

vsukhoml commented 21 hours ago

Another update - I tested what it to send hash_stop before I send last word of the block to fifo. it doesn't help and behavior is the same as with zero length message. Next command fails with intr_state.done not being set: intr_state=0 and status = 2. So it doesn't really work as a workaround.

Also, in my test with zero-length message between hash_start and hash_stop, I found that despite failure I read out correct SHA2 state in digest registers.

gdessouky commented 14 hours ago

Thanks @vsukhoml, I'm trying to follow through your observations vs. the RTL but waveforms will definitely help a lot, so at this point I'm just suggesting other unverified ideas at high-level to try out.

What happens if you read your digest, then deassert CFG.sha_en as well as hash_start, hash_process, hash_stop and hash_continue? Does this bring it into operational state again?

vogelpi commented 14 hours ago

@martin-velay and I could track down the RTL bug inside prim_sha2:

assign idle_o = (fifo_st_q == FifoIdle) && (sha_st_q == ShaIdle) && !hash_go;

For the primitive to signal the idle status, both the internal FIFO and the SHA core need to be idle.

The problem here is that the FIFO FSM only reacts on the msg_feed_complete pulse (resulting from the stop command) while being in the FifoWait state, i.e., after the internal FIFO has been emptied and while the SHA core is processing:

      FifoWait: begin
        if (msg_feed_complete && one_chunk_done) begin // <-- Only here we listen on the msg_feed_complete
          fifo_st_d      = FifoIdle;
          // hashing the full message is done
          hash_done_next = 1'b1;
        end else if (one_chunk_done) begin
          fifo_st_d      = FifoLoadFromFifo;
        end else begin
          fifo_st_d      = FifoWait;
        end
      end

Once the core is done, the one_chunk_done signal is raised and we go back in to the FifoLoadFromFifo state. This state doesn't listen onto the msg_feed_complete signal. The FSM is waiting for new data forever, but that new data is not coming because outside of the primitive, the stop is registered and we won't ever forward data again.

A potential fix would to make the FifoLoadFromFifo listen on the msg_feed_complete signal while the FIFO is empty:

      FifoLoadFromFifo: begin
        if (!shaf_rvalid) begin
          // Wait until it is filled
          fifo_st_d          = FifoLoadFromFifo;
          update_w_from_fifo = 1'b0;
          if (msg_feed_complete) begin // <-- potential RTL fix to be confirmed.
            fifo_st_d = FifoIdle;      // <-- potential RTL fix to be confirmed.
          end                          // <-- potential RTL fix to be confirmed.
        end else if (w_index_q == 4'd 15) begin
          fifo_st_d = FifoWait;
          // To increment w_index and it rolls over to 0
          update_w_from_fifo = 1'b1;
        end else begin
          fifo_st_d          = FifoLoadFromFifo;
          update_w_from_fifo = 1'b1;
        end
      end

A possible software workaround could be to:

Between sending the last word and sending the stop, we do have 64 clock cycles (SHA2-256) or 80 clock cycles (SHA2-384, 512). So we need to be fast. I think disabling interrupts is the only option we have here.

vogelpi commented 13 hours ago

Here some waves to visualize the situation in the RTL: Image

martin-velay commented 8 hours ago

I have reproduced the bug at block level, thanks for raising this issue and for the details.

vsukhoml commented 6 hours ago

Could you please elaborate on "workaround where we "insert" the delay before writing the last word in the message FIFO: writing "block_size - 1 word" words, inserting a large delay, writing the last word into the FIFO, then triggering the stop. And it seems to work well. If this can be acceptable, then it's fine. Otherwise we should find out another solution."

Does it make things insensitive to delay between last word/byte sent to fifo and hash_stop? How long shall be a delay?

I tried to unblock HMAC when it stuck with continuing writes to MSG_FIFO after failed hash_stop, but it seems didn't work. So far various combinations of writes to fifo and cycling CFG_SHA_EN didn't change things. I observed fifo entry count in STATUS growing, but w/o anything, and once I reach 128 bytes it seems that I hang on the write to fifo(?).

vsukhoml commented 6 hours ago

As for disabling interrupts - it may be a solution for some use cases, but not others - e.g. when cryptolib runs in the user mode, so can't really disable interrupts other than with some support from OS.

I also tried delay (for (size_t i = 0; i < 1000; i = launder32(i + 1)) ;) before the last word or byte of the block, but it doesn't change anything for me. To do that I check MESSAGE_LENGTH_LOWER would be aligned for a block for a given data, and if so, depending on alignment choose where to add delay - before last word or byte and do it.