lowRISC / opentitan

OpenTitan: Open source silicon root of trust
https://www.opentitan.org
Apache License 2.0
2.53k stars 754 forks source link

[rtl/mubi] Tighten up mubi CSRs #9273

Open matutem opened 2 years ago

matutem commented 2 years ago

The reason we use mubi (4 bits instead of 1) is so that flipping random bits won't change behavior, and only flipping all 4 does. This is why we use strict/loose comparisons. But there are some CSR with mubi values, like clkmgr.extclk_ctrl. I would expect the spec to say only mubi true or false guarantee some behavior, which is a bit lame since that implies we should say all other values cause cause unspecified behavior.

For example, if the CSR is written as 1011 then flipping a single bit will cause it to become true, which defeats the whole point of having redundant bits. Of course, we don't expect SW to do this, but then perhaps it should cause an error. Or at least the specs should strongly recommend true/false values only.

cdgori commented 2 years ago

Just a clarifying comment - we can't allow the other values to be "unspecified behavior."

There's basically 2 options.

For both, let's say that the logic controlled by the mubi CSR field has some "safe default" behavior (== value A in the CSR for this example), and setting the CSR to 5 causes a more privileged operation/behavior. I assume that the CSR has a reset value of A then, as well.

Option 1: all written values != 5 behave as if value == A. (15 safe values, 1 "privileged" value). This is probably close to what happens now (?) but as @matutem rightly points out it means that writing the non-"A" values to the CSR reduces the Hamming distance of the CSR from the privileged value, which isn't great.

Option 2: for written value !=5 || written value != A, generate an error (alert?). This is much safer, and I would prefer it, I just don't know how much work it is. I think it's easier (?) on DV because the behavior is then more consistent/predictable, but I guess you then have to model a new error behavior.

Comments?

matutem commented 2 years ago

I would also prefer Option 2.

tjaychen commented 2 years ago

I think at this juncture it might be a bit much to strictly check for the values that way.

One thing that @vogelpi has suggested in the past is that the regfile itself should have "safe value" checks. Ie, if we say its mubi types, the register itself only allows "true / false" updates, and any other values basically results in a "nothing happens".

So I feel like a safe value qualification check + an error check for it not being one of the multi-bit values is probably what we want. However given the schedule I'm not sure it's wise to put it in now, and it might be better to rely on software to do the right thing for now.

For clkmgr specifically however, I should at least clearly spec out what should happen when certain values at used at different points in time.

matutem commented 2 years ago

My understanding is that iff extclk_ctrl.sel is mubi true the external clock is selected, but in order to deselect it sw must set it to mubi false, all other values will keep the external clock selected.

In addition, a step down will only happen when external clocks are selected and .low_speed_sel is mubi true.

So as far as I know, the ambiguity is only for extclk_ctrl.sel.

vogelpi commented 2 years ago

As mentioned by Tim, my suggestion would be to directly in the regfile have either a "safe value" check and resolve the invalid values to something valid BEFORE propagating the value into the flops. This is basically Option 1 above but without the reduced Hamming distance if software writes something "wrong".

In AES, we are doing this by making the corresponding CSR hwext. By extending the regtool we could adopt the same behavior in various places in the design with much lower effort compared to making all these regs hwext.

tjaychen commented 2 years ago

yeah the more i think about it, the more it makes sense to adopt the safe value approach in the future, especially now that we are standardizing on mubi types.

So whenever a non-mubi valid value is written, we would also signal an alert, but probably a recoverable one (like shadow update).

matutem commented 2 years ago

Can we reconsider fixing this, in light of the AST assertions we are hitting?

tjaychen commented 2 years ago

we certainly can, but i'm just concerned about the DV impact. @matutem @cindychip @weicaiyang do we want to quickly chat tomorrow? I can describe what the change would be, and you guys can tell me if it's wise to introduce such a change now.

weicaiyang commented 2 years ago

we certainly can, but i'm just concerned about the DV impact. @matutem @cindychip @weicaiyang do we want to quickly chat tomorrow? I can describe what the change would be, and you guys can tell me if it's wise to introduce such a change now.

sgtm, let's discuss this in tomorrow's meeting.

andreaskurth commented 1 year ago

Triaged for clkmgr. I think the points relevant for M2.5 have been addressed but the rest should be completed afterwards --> https://github.com/lowRISC/opentitan/labels/Type%3AIcebox

vogelpi commented 8 months ago

After studying this and related issues and PRs, I am convinced that everything that should be done for production has indeed been done. Specifically for clkmgr, adding rectifying behavior to the mentioned CSR wouldn't make a huge difference, there are already other hurdles in place to overcome when switching the clock source. I am thus changing labels accordingly.

However, this is issue is actually about doing something more generally about this for future hardware blocks. I do think it would be valuable to have an optional feature inside regtool to e.g.:

I think this would be valuable for future hardware blocks and the implementation could actually be a nice starter task.