openhwgroup / core-v-xif

RISC-V eXtension interface that provides a generalized framework suitable to implement custom coprocessors and ISA extensions
https://docs.openhwgroup.org/projects/openhw-group-core-v-xif/en
Other
60 stars 24 forks source link

Support for instructions that use destination register as 3rd source #191

Closed MPEZZIN closed 6 months ago

MPEZZIN commented 6 months ago

Comment

In current version of specification, only R4-type instructions are considered and instruction bits [31:27] are explicitly mentioned for 3rd operand. I suppose that instructions that read destination register use bits [11:7] bits for third source instead.

Proposed Resolution

Support for instructions that read destination registers, and corresponding bit field [11:7], should be mentioned explicitly in the document, not only the R4-type.

Addition Info

No response

christian-herber-nxp commented 6 months ago

If I oversee things correctly, using [11:7] as source is currently not supported by CV-X-IF. Also, I am not aware of a RISC-V instruction format that would use this bit range as source. Is there a specific instruction that you are looking to implement with this?

MPEZZIN commented 6 months ago

Mainly Multiply-accumulate instructions from Vector extensions, PULP custom extensions and upcoming P extensions. It is considered implementing such extensions using CV-X-IF.

christian-herber-nxp commented 6 months ago

The vector extensions do that on the basis of the vector registers only, so that would not be an issue. The P extension, however, is an issue. It is unfortunate that there are now two versions of three source operand instructions. I also feel that the P extension probably did the right thing, but that puts F in a bad place, but that's not really the topic at hand.

For the issue interface, this is an issue. The issue interface is designed based on the assumption that the CPU decoder can easily identify where the source operands are. Now if that is no longer the case, we should have a deep think here. For 16b instructions, the situation was already messy enough that the compressed interface had to be introduced. A brute force solution would be to let the issue response decode the position and width of all operands (or even just provide the operands). This would of course be a bigger change that put a good delay on a possible ratification.

@Silabs-ArjanB : Do you want to weigh in on this?

MPEZZIN commented 6 months ago

I made a quick review of current vector extensions specification and I didn't find any instruction that uses x[rd] as a source register. But this this may change in the future.

From my point of view this is just a documentation problem, not a technical one. Since the coprocessor will have to decode the instructions anyway, the actual requirement is on the register interface, not the issue one. So we could just relax the specification to mention that we can read up to 3 registers in parallel, and it is up to the coprocessor instruction decoding logic to provide the appropriate inputs on the register interface. To my knowledge there is currently no instruction that require reading more than 3 operands so the current register interface may be sufficient.

christian-herber-nxp commented 6 months ago

I made a quick review of current vector extensions specification and I didn't find any instruction that uses x[rd] as a source register. But this this may change in the future.

Correct. They went a different route for multiply add. It is strange to me that different decisions were made for F and for V.

From my point of view this is just a documentation problem, not a technical one. Since the coprocessor will have to decode the instructions anyway, the actual requirement is on the register interface, not the issue one. So we could just relax the specification to mention that we can read up to 3 registers in parallel, and it is up to the coprocessor instruction decoding logic to provide the appropriate inputs on the register interface. To my knowledge there is currently no instruction that require reading more than 3 operands so the current register interface may be sufficient.

It is not that simple. Indeed, you cannot expect a CPU to do more than three parallel reads from the register file. So before reading, it has to know which one to read. While issue and register interface are separate interfaces, short pipelines like CV32E40X will want to start the issue and register transaction in the same cycle. I.e. the register transaction cannot wait for information it gets from issue_resp. E.g. adding a signal in issue_resp to signal the placement of the third source register doesn't really work, as it is too late.

This is why I am thinking if maybe the issue interface is too restrictive right now. Any new instruction format introduced in the future will not be supported. If theissue_resp would signal the actual operand numbers to be provided, that would solve it and make the compressed interface unnecessary, but it might be difficult to deal with for short pipelines.

christian-herber-nxp commented 6 months ago

another bandaid solution would be to require for the CPU to assume rs/rd multiplexing if using the MADD, MSUB, NMSUB, or NMADD opcode. I.e. it is very easy to decode that.

Silabs-ArjanB commented 6 months ago

another bandaid solution would be to require for the CPU to assume rs/rd multiplexing if using the MADD, MSUB, NMSUB, or NMADD opcode. I.e. it is very easy to decode that.

Are you talking about PULP opcodes here? If so, then I don't think we can put anything specific for that in the XIF spec. The XIF spec should be based on RISC-V International conventions in my opinion. For now that means R4 support only. If there are other 3 operand formats defined by RISC-V then we would simply need to add the minimal decoding needed in the CPU to support them (and this then implies for some processors that the register source identifier might no longer come from flops which will not be very nice).

The source identifier should not come from the coprocessor (for timing reasons).

christian-herber-nxp commented 6 months ago

Are you talking about PULP opcodes here? If so, then I don't think we can put anything specific for that in the XIF spec.

No. These are RV defined major opcodes, each containing one R4 instruction.

Base on your other comments, I think my solution is in line with you.

Silabs-ArjanB commented 6 months ago

@christian-herber-nxp Okay, I see. We need a way to define then what predecoding a coprocessor can assume from a given CPU if a CPU claims to support 3 operand instructions.

christian-herber-nxp commented 6 months ago

@christian-herber-nxp Okay, I see. We need a way to define then what predecoding a coprocessor can assume from a given CPU if a CPU claims to support 3 operand instructions.

@Silabs-ArjanB : For standard instructions, we are pretty safe, as there is not really encoding space for additional R4 instructions. P uses rd/rs3 multiplexing. You cannot predict if somebody using a custom extension would use R4 instructions. But I think the current definition is the safest that will fit most use cases.

MPEZZIN commented 6 months ago

The source identifier should not come from the coprocessor (for timing reasons).

Unfortunately, I don't see how to deal with such a requirement in a clean way. Instructions that use rd as source register (either from custom extensions like XPULP or from upcoming standard extensions like P) may use any regular instruction formats (mostly R from what I have seen up to now). Since the processor cannot make any assumption on the behavior of the instructions that will be forwarded to the coprocessor, would this mean that the third read port should always provide rd content (except for R4 format instructions) ?

Silabs-ArjanB commented 6 months ago

@christian-herber-nxp I think it is a must that we support "P uses rd/rs3 multiplexing" as well (although of course P has not been ratified yet)

christian-herber-nxp commented 6 months ago

@christian-herber-nxp I think it is a must that we support "P uses rd/rs3 multiplexing" as well (although of course P has not been ratified yet)

Exactly. This is why I made the change to the specification. I just updated the PR as their was a conflict after fixing the typos. It seems that this is the most practical approach at this time. If you are ok, this seems ready for merge.