Open moidx opened 1 day ago
I'd say 10 iterations for the loop works with a little more complex code, but I suppose closer to 30-40 would definitely cause this. Easiest test - run SHA256 with 64+ message, so CMD.HASH_STOP would be used before final steps to save context.
Thanks for reporting this @vsukhoml , and thanks for filing the issue @moidx .
Is it possible to workaround the issue by checking the FIFO empty status bit before triggering the stop? Alternatively, can we check the idle status bit before triggering the stop?
It would be good to understand what the status of the FIFO and the overall IP is when the issue occurs and when not. Emptying the FIFO takes just 16 clock cycles, and computing a block in order of 48 to 80 clock cycles. With 30 - 40 loop iterations I would expect that we signal the stop when the hardware is already idle.
Update: I could reliably reproduce this on the FPGA using this command:
bazel test --test_output=streamed --cache_test_results=no //sw/device/tests/crypto:hmac_multistream_functest_fpga_cw310_sival_rom_ext
With 40 loop iterations, the test reliably hangs when streaming the first segment. Printing the status register before signaling the stop reveals that the FIFO is always empty, independent whether the test fails or passes. All other status bits/fields are 0. Also the ERR_CODE and MSG_LENGTH_LOWER registers don't provide useful insight.
I will now collect some waves with the following command and report back tomorrow:
util/dvsim/dvsim.py hw/top_earlgrey/dv/chip_sim_cfg.hjson -i chip_sw_hmac_multistream -t vcs --purge --reseed 1 --waves vpd
I checked that if I'm just making a delay after HMAC_CMD_HASH_STOP_BIT, I can read a non-zero digest, which seems to be the state I'm looking for, but I can't continue - HMAC never reports INTR_STATE.hmac_done, so logic for next commands fails. Status can be either 2 or 3 though. I also tried to send hash_stop multiple times - it didn't affect behavior with no delay, but didn't help for the case with a delay. I suspect there is 96 cycle window between last byte/word of the block sent to FIFO and when hash_stop can be sent for regular operations. If hash_stop is delayed - it stuck in the state with no INTR_STATE.hmac_done.
I tried to "reset" HMAC by sending hash_process command after hash_stop and by sending invalid command with all 4 commands, but it didn't help, though STATUS.idle bit became set. For invalid command I got ERR_CODE=2, and couldn't reset this value, so stuck with status=00000003, err_code=00000002, intr_state=00000000
.
So, besides finding a workaround other than using SW or OTBN for SHA2 with context switching, it would be nice to have means to bring HMAC into operational state after software errors like sending invalid command.
While it is not a good use case, but I get to the same state with sequence of hash_start
and hash_stop
commands with zero length message in between. just one after another. status=2, intr_state=0. I'd say HMAC HW shall not hang on sequences like this.
Another update - I tested what it to send hash_stop
before I send last word of the block to fifo. it doesn't help and behavior is the same as with zero length message. Next command fails with intr_state.done not being set: intr_state=0 and status = 2. So it doesn't really work as a workaround.
Also, in my test with zero-length message between hash_start and hash_stop, I found that despite failure I read out correct SHA2 state in digest registers.
Thanks @vsukhoml, I'm trying to follow through your observations vs. the RTL but waveforms will definitely help a lot, so at this point I'm just suggesting other unverified ideas at high-level to try out.
Are you sure CFG.sha_en
is set and remains enabled throughout all of this? Because getting ERR_CODE=2 at some point makes me wonder.
I checked that if I'm just making a delay after HMAC_CMD_HASH_STOP_BIT, I can read a non-zero digest, which seems to be the state I'm looking for, but I can't continue - HMAC never reports INTR_STATE.hmac_done, so logic for next commands fails.
What happens if you read your digest, then deassert CFG.sha_en
as well as hash_start, hash_process, hash_stop and hash_continue? Does this bring it into operational state again?
@martin-velay and I could track down the RTL bug inside prim_sha2
:
assign idle_o = (fifo_st_q == FifoIdle) && (sha_st_q == ShaIdle) && !hash_go;
For the primitive to signal the idle status, both the internal FIFO and the SHA core need to be idle.
The problem here is that the FIFO FSM only reacts on the msg_feed_complete
pulse (resulting from the stop command) while being in the FifoWait
state, i.e., after the internal FIFO has been emptied and while the SHA core is processing:
FifoWait: begin
if (msg_feed_complete && one_chunk_done) begin // <-- Only here we listen on the msg_feed_complete
fifo_st_d = FifoIdle;
// hashing the full message is done
hash_done_next = 1'b1;
end else if (one_chunk_done) begin
fifo_st_d = FifoLoadFromFifo;
end else begin
fifo_st_d = FifoWait;
end
end
Once the core is done, the one_chunk_done
signal is raised and we go back in to the FifoLoadFromFifo
state. This state doesn't listen onto the msg_feed_complete
signal. The FSM is waiting for new data forever, but that new data is not coming because outside of the primitive, the stop is registered and we won't ever forward data again.
A potential fix would to make the FifoLoadFromFifo
listen on the msg_feed_complete
signal while the FIFO is empty:
FifoLoadFromFifo: begin
if (!shaf_rvalid) begin
// Wait until it is filled
fifo_st_d = FifoLoadFromFifo;
update_w_from_fifo = 1'b0;
if (msg_feed_complete) begin // <-- potential RTL fix to be confirmed.
fifo_st_d = FifoIdle; // <-- potential RTL fix to be confirmed.
end // <-- potential RTL fix to be confirmed.
end else if (w_index_q == 4'd 15) begin
fifo_st_d = FifoWait;
// To increment w_index and it rolls over to 0
update_w_from_fifo = 1'b1;
end else begin
fifo_st_d = FifoLoadFromFifo;
update_w_from_fifo = 1'b1;
end
end
A possible software workaround could be to:
Between sending the last word and sending the stop, we do have 64 clock cycles (SHA2-256) or 80 clock cycles (SHA2-384, 512). So we need to be fast. I think disabling interrupts is the only option we have here.
Here some waves to visualize the situation in the RTL:
I have reproduced the bug at block level, thanks for raising this issue and for the details.
Could you please elaborate on "workaround where we "insert" the delay before writing the last word in the message FIFO: writing "block_size - 1 word" words, inserting a large delay, writing the last word into the FIFO, then triggering the stop. And it seems to work well. If this can be acceptable, then it's fine. Otherwise we should find out another solution."
Does it make things insensitive to delay between last word/byte sent to fifo and hash_stop? How long shall be a delay?
I tried to unblock HMAC when it stuck with continuing writes to MSG_FIFO after failed hash_stop, but it seems didn't work. So far various combinations of writes to fifo and cycling CFG_SHA_EN didn't change things. I observed fifo entry count in STATUS growing, but w/o anything, and once I reach 128 bytes it seems that I hang on the write to fifo(?).
As for disabling interrupts - it may be a solution for some use cases, but not others - e.g. when cryptolib runs in the user mode, so can't really disable interrupts other than with some support from OS.
I also tried delay (for (size_t i = 0; i < 1000; i = launder32(i + 1)) ;
) before the last word or byte of the block, but it doesn't change anything for me. To do that I check MESSAGE_LENGTH_LOWER would be aligned for a block for a given data, and if so, depending on alignment choose where to add delay - before last word or byte and do it.
Description
Certain delays between last message FIFO write and
HMAC_CMD_HASH_STOP_BIT
andHMAC_CMD_HASH_PROCESS
commands cause theHMAC.STATUS
register to get stuck withhmac_idle
bit cleared.Reproduction steps (pseudo-code):
During regular operating conditions, interrupts can fire at any time, causing delays similar to what is captured in the reproduction steps. This may cause hangs in the field that will be very difficult to debug / root cause.
Issue reported by @vsukhoml. CC: @vogelpi, @martin-velay, @johannheyszl, @gdessouky