lowRISC / opentitan

OpenTitan: Open source silicon root of trust
https://www.opentitan.org
Apache License 2.0
2.53k stars 754 forks source link

[dv] FSM transition coverage #14407

Open weicaiyang opened 2 years ago

weicaiyang commented 2 years ago

Problem

These 2 types of transitions aren't easy to be covered in regular function tests as raised in #14028

any state -> reset

Approach 1

If we want to ensure that the FSM goes back to reset state when reset happens, RTL lint tool can check it. @msfschaffner has done some experiment - enable an AscentLint rule in the lint policy file and it can catch any FSM that doesn't have a reset. If we agree that FSM must always associate with a reset, we could enable Lint to check this.

On the DV side, VCS provides a way to exclude FSM transitions that can only happen on reset. This PR #14360 adds a build option and a configuration file to enable this capability. Since it only excludes FSM transitions that can only happen on reset, user still needs to test the transition to reset if it could be caused by other condition, which is good to test.

Once both Lint and DV flow is set up, block maintainer doesn't need to write any test to cover it.

(Need to enable this for Xcelium if we choose this approach.)

Approach 2

Scenario:

send a normal sequence -> in a parallel thread, after random delay (before the sequence completes) issue reset and kill the sequence -> read all CSRs to check -> restart another normal

We have a test - stress_all_with_rand_reset, which creates a parallel thread to issue reset at any time along with all sequences. It mimics a scenario - reset button is pushed any time when the chip is operating, and check it works again after reset.

But this test doesn't guarantee the reset happens at all the states. Some states may last only a few cycles, which would be extremely hard to hit. We may need to tune the timing of the reset, in order to hit all the transition, which could be really timing consuming but not a lot of reward.

Approach 3

Scenario:

forcing FSM to any state -> issue a reset -> start the regular sequence

This is much easier than approach 2, but seems like it doesn't cover more than the lint tool does. It essentially tests FSM must have a reset.

This requires each block maintainer to their own test.

any state -> error in the sparse_state_fsm

In the common sec_cm test, we only test idle -> error, but we run FPV to prove any state -> error unconditionally leads to a fatal alert. Since any state can transition to error state, it's also very time consuming to cover all these transition in simulation. And FPV already tests that, so it could be safe to exclude these transitions manually. There is probably no automatic way to exclude them.

senelson7 commented 2 years ago

It seems like option 3 (like I am doing) is a valid way to do it but you're right, the owner has to write the testcase and set outputs to 0 so there isn't a fatal alert. So I would say option 1 if you think that covers it just as well and the error state FPV verifies that we always get a fatal alert is what we really want to prove anyways.

weicaiyang commented 2 years ago

It seems like option 3 (like I am doing) is a valid way to do it but you're right, the owner has to write the testcase and set outputs to 0 so there isn't a fatal alert. So I would say option 1 if you think that covers it just as well and the error state FPV verifies that we always get a fatal alert is what we really want to prove anyways.

I think the option 3 is also valid. As discussed in the meeting, everyone agreed to rely on Lint and FPV to verify these 2 transitions, which saves some DV efforts. We can probably remove those sequences you added in #14028

weicaiyang commented 2 years ago

Items to close this issue

senelson7 commented 2 years ago

13850 excludes fsm reset scoring for xcelium. Currently just comments it out but could remove now since it's the plan.

weicaiyang commented 2 years ago

13850 excludes fsm reset scoring for xcelium. Currently just comments it out but could remove now since it's the plan.

not sure if removing set_fsm_reset_scoring also exclude the transition to reset that could happens due to a non-reset event? Since VCS only excludes FSM transitions that can only happen on reset, it's good to have Xcelium do the same exclusion.

senelson7 commented 2 years ago

I ran it with/without and with set_fsm_reset_scoring, the fsm coverage shows two reset states, one being transitions due to reset and the other normal state transitions to reset. And without, there is just a single reset state showing normal transitions. So I think that's the proper directive to remove but we can look into it more, I'll check with our Cadence AE, and maybe @weicaiyang and/or @sriyerg want to also so we get confirmation from two sources.

weicaiyang commented 2 years ago

I ran it with/without and with set_fsm_reset_scoring, the fsm coverage shows two reset states, one being transitions due to reset and the other normal state transitions to reset. And without, there is just a single reset state showing normal transitions. So I think that's the proper directive to remove but we can look into it more, I'll check with our Cadence AE, and maybe @weicaiyang and/or @sriyerg want to also so we get confirmation from two sources.

Great! Thanks for testing this and confirming with AE. I think https://github.com/lowRISC/opentitan/pull/13850 is what we need.

msfschaffner commented 1 year ago

I am deferring this bug to M3. The new policy file is now available in newer releases of AscentLint, but the tool still exhibits a problem with excessive runtimes on our top-level at the moment. We are following up with the vendor and will come back to this later.

johngt commented 3 months ago

Looked at with @vogelpi - due to time left - consider this a V3 item Deprioritising to P4.