Closed imphil closed 2 years ago
Thanks for the great summary Philipp!
In my opinion, the CI loop should be checking the integrity of the build, and that basic functionality is maintained when committing code. It seems like testing the masked rom in the CI loop is more than we need to do, in order to achieve those two goals. For instance, if the masked rom image is broken, will this significantly impact the test coverage during the nightly regression? On the other hand, if a commit breaks the CSR read/write logic, that would basically trash the results of the nightly regression.
I'm also a proponent of keeping the CI loop as short as possible. I'd rather err on the side of a shorter CI loop, with the occasional problem that sneaks through, rather than trying to catch all issues in the loop, and slow down the development process.
Hi @timothytrippel, I am marking this issue as obsolete are already plans for integrating mask ROM testing on the FPGA using Bazel.
We currently test the Mask ROM in CI in a system test running against a verilated Earl Grey toplevel. Due to the complexities of signature verification, Mask ROM executes a fair amount of code and takes a fair amount of time when simulated.
We do have a couple options, here's a non-exhaustive list:
Continue running Mask ROM tests against a verilated simulation, but run these simulations on faster hardware.
https://github.com/lowRISC/opentitan/pull/6649 is an example of that, reducing the system test time from 55 to 20 minutes by choosing an internal CI builder (a GCP N2 machine with 3 guaranteed CPUs, 6 GB RAM, but the ability to use more in bursts). This was just a random machine config we had around, we can optimize this further to get faster test times.
Test Mask ROM on FPGA only. On an FPGA the mask rom test should only take a couple of seconds.
The main challenge here is the fact that the maskrom is part of the FPGA bitstream. If we want to change the ROM image inside the bitstream, we either need to rebuild it, or make the splice scripting work reliably.
Of course, additionally we can reduce the frequency of Mask ROM tests. We can run them only on master merges, once per hour, etc. In all those cases, however, we do need to allocate resources to a person to look for failure, triage them, do backouts or follow-ups, etc., since we effectively move the first-level responsibility to fix breakages from the author of the PR to the community.