openhwgroup / core-v-xif

RISC-V eXtension interface that provides a generalized framework suitable to implement custom coprocessors and ISA extensions
https://docs.openhwgroup.org/projects/openhw-group-core-v-xif/en
Other
53 stars 23 forks source link

[BUG] ECS XS mechanism cannot work #178

Closed christian-herber-nxp closed 3 months ago

christian-herber-nxp commented 4 months ago

Is there an existing CV-X-IF Bug for this?

Task Description

via ecsdata, a coprocessor can write back to mstatus.xs. The problem is, that mstatus.xs is an aggregate state, and a coprocessor cannot decide the value of that on its own. E.g. when a coprocessor has transitioned from Dirty to Clean, it is not clear if the aggregate state is Clean of Dirty.

Proposed resolution

There is no clean way to fix this in my mind. The only real way of doing this seems to be providing the actual XS signals to the CPU per coprocessor.

Description of Done

Clean solution for ECS or ECS mechanism removed.

Associated PRs

No response

Silabs-ArjanB commented 4 months ago

Given that XIF is a point to point interface I do not see an issue here. The processor can itself merge its state with that of 1 coprocessor. If there are multiple coprocessors then it is the task of the interconnect to do the merging, but that is outside the scope of the XIF spec.

christian-herber-nxp commented 4 months ago

My counter example works with a single coprocessor. CPU has two extension: E1: state Dirty E2: state Clean Coprocessor has only one extension CE1: state Dirty

Now software cleaned the coprocessor extension state, and sets the state to Clean (e.g. via a CSR write). At the end of cleaning, what should the coprocessor report back? It has no way of telling if the aggregate state is dirty or clean.

Currently, the spec reads: If ecswe[2] is 1, then the value in ecsdata[5:4] is written to mstatus.xs

If you want to solve it in a way that you described, then ecsdata[5:4] may not written to mstatus.xs, but must be aggregated with the other status. Which brings me to the same point: Isn't this way a lot more complex than just wiring those two bits from the coprocessor to the CPU? That would be a lot less signals.

Silabs-ArjanB commented 4 months ago

Now software cleaned the coprocessor extension state, and sets the state to Clean (e.g. via a CSR write). At the end of cleaning, what should the coprocessor report back? It has no way of telling if the aggregate state is dirty or clean.

A coprocessor should only report its own state.

Currently, the spec reads: If ecswe[2] is 1, then the value in ecsdata[5:4] is written to mstatus.xs

Instead of 'written' it should state 'aggregated' indeed.

If you want to solve it in a way that you described, then ecsdata[5:4] may not written to mstatus.xs, but must be aggregated with the other status.

Agreed.

Which brings me to the same point: Isn't this way a lot more complex than just wiring those two bits from the coprocessor to the CPU? That would be a lot less signals.

As you also said, the coprocessor does not know the aggregated state. Also I do not think that the normal way to impact such bits is via CSR instructions. The way I understand this mechanism is that the extension state is supposed to be updated as side effect of other instructions (as e.g. floating point instructions that impact the F register file should update the FS field).

christian-herber-nxp commented 4 months ago

Given the current discussion, I would already argue that the ECS XS bits should not be provided in issue/register transaction, as this implies something is done with those (I assume the attention was aggregating to write back).

As you also said, the coprocessor does not know the aggregated state. Also I do not think that the normal way to impact such bits is via CSR instructions. The way I understand this mechanism is that the extension state is supposed to be updated as side effect of other instructions (as e.g. floating point instructions that impact the F register file should update the FS field).

the transition from initial/clean to dirty should happen if a write occurs that alters the state. The spec is quite detailed about how this can also be imprecise.

The act of cleaning must happen through a CSR write. Also, setting the state to OFF and back is intended to be done via an instruction.

For FS and VS the situation is a bit different than for XS, as unfortunately, the bit has to be implemented in mstatus and thus in the main core, but it is only used in the extension, thus the coprocessor. There you do have to provide the value, and take it back.

christian-herber-nxp commented 4 months ago

To summarize our discussion: XS: this could be improved by removing it from the register interface, and stating that the extension just provides it state in result, and the CPU must aggregate the state. An alternative could be to just provide the state of XS as signals.

VS/FS: A performance issue was observed. As almost every instruction would be writing to VS/FS, this might lead to a lot of stalls. No good solution was yet found.

As the solution to these issues seems too far out, the proposal is to remove the parts of the specification related to ECS from the version proposed for ratification. Having the base ratified would allow users to implement coprocessors, and find an optimal solution as a proof of concept. Then this solution can be integrated into the standard.

pascalgouedo commented 3 months ago

I think that XS should be kept in register interface to be able to react correctly on coprocessor side if it is off. For XS aggregation from coprocessors to CPU, either it is managed outside of the core (the interconnect) to memorize all coprocessors values and generate correct aggregated XS or all signals go from coprocessors to CPU but then interface depends of the number of connected coprocessors (quite bad).

christian-herber-nxp commented 3 months ago

I think that XS should be kept in register interface to be able to react correctly on coprocessor side if it is off. For XS aggregation from coprocessors to CPU, either it is managed outside of the core (the interconnect) to memorize all coprocessors values and generate correct aggregated XS or all signals go from coprocessors to CPU but then interface depends of the number of connected coprocessors (quite bad).

XS is a read only register on the CPU. You cannot set mstatus.XS to Off.

pascalgouedo commented 3 months ago

Ah didn't see this subtilty. So you rely on each coprocessor XS value and no need to have aggregated value seen from a coprocessor point of view.

christian-herber-nxp commented 3 months ago

exactly. the privileged spec expects the coprocessor to implement its own XS bits.

pascalgouedo commented 3 months ago

This aggregation is still needed, best would be outside the core but needs a small aggregator module.