lowRISC / opentitan

OpenTitan: Open source silicon root of trust
https://www.opentitan.org
Apache License 2.0
2.44k stars 730 forks source link

[test-triage] rom_e2e_self_hash #23954

Closed engdoreis closed 4 days ago

engdoreis commented 1 week ago

Hierarchy of regression failure

Chip Level

Failure Description

UVM_ERROR @ 200016.083823 us: (chip_sw_base_vseq.sv:306) uvm_test_top.env.virtual_sequencer [uvm_test_top.env.virtual_sequencer.chip_sw_base_vseq] SW TEST TIMED OUT. STATE: SwTestStatusBooted, TIMEOUT = 200000000 ns

UVM_INFO @ 200016.083823 us: (uvm_report_catcher.svh:705) [UVM/REPORT/CATCHER] --- UVM Report catcher Summary ---

Steps to Reproduce

Tests with similar or related failures

engdoreis commented 1 week ago

@hayleynewton for visibility

vogelpi commented 1 week ago

I am assigning this to @timothytrippel who was anyway working on this AFAIK.

andreaskurth commented 1 week ago

Reopening because rom_e2e_self_hash failed 100% (3/3 seeds) in the latest nightlies:

Offending 'initiated' has 3 failures:

    Test rom_e2e_self_hash has 3 failures.

        0.rom_e2e_self_hash.115472807137902301498829556509564391313865125567348757010398434489747930527819
        Line 1062, in log /container/opentitan-public/scratch/os_regression/chip_earlgrey_asic-sim-vcs/0.rom_e2e_self_hash/latest/run.log

              Offending 'initiated'
          UVM_ERROR @ 16099.417502 us: (hmac.sv:932) [ASSERT FAILED] ValidHashProcessAssert
          UVM_INFO @ 16099.417502 us: (uvm_report_catcher.svh:705) [UVM/REPORT/CATCHER]
          --- UVM Report catcher Summary ---

        1.rom_e2e_self_hash.75999083613068768941854234916114584399853730872229818701793139005940313478202
        Line 1111, in log /container/opentitan-public/scratch/os_regression/chip_earlgrey_asic-sim-vcs/1.rom_e2e_self_hash/latest/run.log

              Offending 'initiated'
          UVM_ERROR @ 12056.475284 us: (hmac.sv:932) [ASSERT FAILED] ValidHashProcessAssert
          UVM_INFO @ 12056.475284 us: (uvm_report_catcher.svh:705) [UVM/REPORT/CATCHER]
          --- UVM Report catcher Summary ---

        ... and 1 more failures.

@gdessouky: You recently changed the assertion that fails; maybe you have an intuition about what might be going wrong here?

gdessouky commented 5 days ago

This assertion failing indicates that we either trigger process without triggering start first or we triggered start and not process afterwards in that sequence.

I would have guessed @timothytrippel would have upset that assertion locally already when he touched the test last week/s since that assertion change was committed ~a month ago.

I can't run TLTs at my end, so can't test out anything locally but looking at this code https://github.com/lowRISC/opentitan/blob/c04cc5d074d9acefd052feb8a4fd6de6bfa0faab/sw/device/silicon_creator/rom/e2e/release/rom_e2e_self_hash_test.c#L58-L61 I would try adding hmac_sha256_process(); after hmac_sha256_update(); along the same pattern here https://github.com/lowRISC/opentitan/blob/master/sw/device/silicon_creator/lib/drivers/hmac_functest.c#L73-L78 and see how that turns out. @jadephilipoom I realize hmac_sha256_final() will map to hmac_sha256_final_truncated()which should call hmac_sha256_process() anyway, but I don't see that exact sequence of init - update - final in the hmac_functest.c - do you have any idea what could be going wrong here?

andreaskurth commented 5 days ago

@gdessouky I just tried your suggestion but it doesn't fix the problem unfortunately

gdessouky commented 5 days ago

@timothytrippel managed to pull out waveforms from his VM and we'll be looking into them to see what's going on, thanks for trying and confirming it doesn't fix it, we wanted to try that out.

timothytrippel commented 5 days ago

After some wave debugging, I think I pinpointed the issue. This assertion expects that hmac_sha256_process() should only be called once (since when it is called, the initiated signal should be set during the same clock cycle). However, software, specifically this code invoked in the ROM for SPX+ signature verification, expects it is able to call process as many times as it would like (first here directly, and then again when hmac_sha256_final_truncated(...) is called) and the hardware will be ok (documented here). The hardware does seem to allow this behavior, but the assertion does not. So updating the assertion should fix this I think. I am re-running the sim with the patch in #24022.

image

timothytrippel commented 4 days ago

With the fix in #24030, the test now passes: image