[test-triage] chip_tap_straps_dev

msfschaffner commented 2 years ago

Hierarchy of regression failure

Chip Level

Failure Description

UVM_FATAL @ us: (mem_bkdr_util.sv:480) [mem_bkdr_util[Rom]] file test_rom_sim_dv.scr..vmem could not be opened for r mode has 15 failures:

Steps to Reproduce

GH revision: 9c0b24ddb
util/dvsim/dvsim.py hw/top_earlgrey/dv/chip_sim_cfg.hjson -i chip_tap_straps_dev --build-seed 2483310614 --fixed-seed 581187855

Tests with similar or related failures

[x] chip_tap_straps_rma
[x] chip_tap_straps_prod
[x] chip_sw_example_rom

weicaiyang commented 2 years ago

The straps tests were passing before. The test doesn't really need to run any SW test, but it needs to load a rom image to pass the rom check. Seems like the vmem is no longer generated after https://github.com/lowRISC/opentitan/pull/13863 is landed?

sriyerg commented 2 years ago

Looks like the bazel cquery returns empty for the test_rom:

$ bazel cquery "labels(data, //sw/device/lib/testing/test_rom:test_rom_sim_dv)"
INFO: Build option --define has changed, discarding analysis cache.
INFO: Analyzed target //sw/device/lib/testing/test_rom:test_rom_sim_dv (0 packages loaded, 7586 targets configured).
INFO: Found 1 target...
INFO: Empty query results
INFO: Elapsed time: 0.791s
INFO: 0 processes.
INFO: Build completed successfully, 0 total actions

For flash based tests, the bazel cquery step returns the data artifacts, which also includes the test_rom artifacts. In case of tests that are run directly from ROM, the test_in_rom flag is set, which also enables the cquery step to return the data artifacts. I will go through the rules to understand this better. In the meantime, either we have the ROM finish its thing and jump to flash (we can pick the example test from flash), or, instead of the test_rom, pick the example_test_from_rom which has the test_in_rom flag set. I am leaning towards the former, because example_test_from_rom does not perform the steps the test_rom does.

sriyerg commented 2 years ago

Looks like the bazel cquery returns empty for the test_rom:
$ bazel cquery "labels(data, //sw/device/lib/testing/test_rom:test_rom_sim_dv)"
INFO: Build option --define has changed, discarding analysis cache.
INFO: Analyzed target //sw/device/lib/testing/test_rom:test_rom_sim_dv (0 packages loaded, 7586 targets configured).
INFO: Found 1 target...
INFO: Empty query results
INFO: Elapsed time: 0.791s
INFO: 0 processes.
INFO: Build completed successfully, 0 total actions
For flash based tests, the bazel cquery step returns the data artifacts, which also includes the test_rom artifacts, so the SV base test sequence is able to load the images for both, ROM and the flash. In case of tests that are run directly from ROM, the test_in_rom bazel flag is set, which also enables the bazel cquery step to return the data artifacts. I will go through the rules to understand this better. In the meantime to fix this, either we have the ROM finish its thing and jump to flash (we can pick the example test from flash, which vacuously passes), or, pick the example_test_from_rom which has the test_in_rom flag set. I am leaning towards the former, because example_test_from_rom does not perform the steps the test_rom does.

tjaychen commented 2 years ago

wait this is weird... @timothytrippel and I went through a very similar issue some time ago with the other test_in_rom tests (chip_sw_flash_init and rma_unlocked). Is there something that's different for this particular one?

sriyerg commented 2 years ago

Yes these ones are different from the ones you fixed - they only load the //sw/device/lib/testing/test_rom:test_rom image, which does not set the bazel flag test_in_rom and the sw_images flag, also called test_in_rom. In order for bazel cquery to return with the data artifacts the test_in_rom bazel flag needs to be set. So the fix for these tests is simple - just change the image to example_test_from_rom instead.

tjaychen commented 2 years ago

Yes these ones are different from the ones you fixed - they only load the //sw/device/lib/testing/test_rom:test_rom image, which does not set the bazel flag test_in_rom and the sw_images flag, also called test_in_rom. In order for bazel cquery to return with the data artifacts the test_in_rom bazel flag needs to be set. So the fix for these tests is simple - just change the image to example_test_from_rom instead.

sounds good. Thanks for explaining. My bad for not noticing these last time. I probably should have realized the failure signature was similar and fixed them altogether. Thanks for investigating!

timothytrippel commented 2 years ago

bazel cquery "labels(data, //sw/device/lib/testing/test_rom:test_rom_sim_dv)"

So if you want to get the deps of the test ROM (or any opentitan_rom_binary for that matter) you need to run:

bazel cquery "labels(srcs, //sw/device/lib/testing/test_rom:test_rom_sim_dv)"

instead of

bazel cquery "labels(data, //sw/device/lib/testing/test_rom:test_rom_sim_dv)"

This is because the opentitan_rom_binary macro instantiates a filegroup with the name set to the name passed to the macro itself, and the filegroup contains a list of srcs. Alternatively, any test that is defined using the opentitan_functest Bazel macro instantiates a sh_test rule which includes its list of deps via its data attribute.

In the sim.mk file that dvsim.py invokes, a bazel cquery 'labels(data, ...)' command is used to locate all file deps of sh_test rules instantiated by opentitan_functest macros and copy them to the run_dir. Since every sh_test rule instantiated by an opentitan_functest macro contains the ROM as a dep, it gets copied over.

Technically dvsim.py invokes the sw_build make target for both the deps encapsulated in an opentitan_functest macro and the ROM images, since the sw_images field in the chip_sim_cfg.hjson file contains both Bazel targets. The invocation of the sw_build target for the ROM really does nothing except hit the cache, since the proper ROM image is already built when the sh_test rule is built (which is encoded as the "flash" image in the chip_sim_cfg.hjson file).

We could theoretically get rid of the sw_test_mode_test_rom and sw_test_mode_rom run modes as far as building SW images is concerned, but this would break the testbench, as the testbench expects to see a plusarg with all SW images it needs to load (even tho just one bazel target is needed to build all SW images). Hence, I got lazy and just left if for the time being.

Hopefully this all makes sense.

sriyerg commented 2 years ago

Thanks @timothytrippel for the explanation!

lowRISC / opentitan