Proposal: allow the FMA addend to be in either `src_fmt_i` or `dst_fmt_i`

michael-platzer commented 5 months ago

Currently, the addend operand of the FMA shares its FP format with the result:

https://github.com/openhwgroup/cvfpu/blob/4aac6b3e87a30c8567dbe7401eba3274eea18afc/src/fpnew_fma_multi.sv#L37-L38

This makes sense when considering the various fused multiply-accumulate operations of the RISC-V spec, which add the product of two multiplicands given in one FP format to an accumulator in another FP format. However, the RISC-V vector spec defines widening FP add and subtract operations that come in two flavors:

one which sums two narrow operands and produces a wider result
one which sums one narrow and one wide operand to produce a wide result

The latter is directly supported by CVFPU but the former is not, as that would require the addend operand of the FMA to be in src_fmt_i as well.

Looking at the source code of the FMA, nothing is fundamentally preventing the FMA addend to be in a different format than the result. It would suffice to change the following lines, which unpack the 3 operands, to use a different format for operand_c (and info_c):

https://github.com/openhwgroup/cvfpu/blob/4aac6b3e87a30c8567dbe7401eba3274eea18afc/src/fpnew_fma_multi.sv#L234-L240

Hence, in order to improve support for these type of widening FP add/sub and potentially other operations and to generally increase the flexibility of CVFPU, I propose to allow the FMA addend to be in either src_fmt_i or dst_fmt_i (at least for the add operation).

This change could be implemented, for instance, by:

Adding a new port add_fmt_i to the FMA, all opgroup modules and the fpnew_top module that specifies the format of the addend. This would be the most flexible option but changes the ports of the top module.
Adding a new operation ADDW to the operation_e enum, which behaves the same as the ADD operation except that src_fmt_i is used for the addend instead of dst_fmt_i. This would avoid changes of the top module.

@davideschiavone @pascalgouedo @stmach @lucabertaccini Please let me know if you would support such a change and your preferred way of implementing it. If favorable, I would then prepare a PR.

pascalgouedo commented 5 months ago

Hi @michael-platzer

I prefer solution 2 as it doesn't change top level as well as T-Head FDIV/FSQRT interfaces.

Regards, Pascal.

stmach commented 5 months ago

Hi,

as you say, nothing fundamentally prevents using a separate format for each of the operands, and you nicely point out why we didn't need to do it so far when designing the unit. While less general, I also like option 2 as it keeps the changes internal to the FMA module (which itself becomes more general) and doesn't introduce [even more] configurability to the whole unit where it's not needed in most cases. Specialized architectures can still use direct instances of the FMA block if they require a more specific configuration.

Cheers, Steve

michael-platzer commented 5 months ago

Thanks for the feedback @pascalgouedo and @stmach! In that case I will move forward with option 2 and add a new operation for the add with operands of the same FP type.

Specialized architectures can still use direct instances of the FMA block if they require a more specific configuration.

Should I then add a new port add_fmt_i for the addend format to the FMA block, and assign as follows in all instantiations of the FMA?

    .add_fmt_i ( op_i == fpnew_pkg::ADDW ? src_fmt_i : dst_fmt_i )

That would allow specialized architectures to have even more freedom when directly instantiating the FMA block. Or should the format of the addend rather be decided within the FMA block?

stmach commented 5 months ago

I think an extra port on the multiformat FMA and the necessary changes for assigning it in the instantiating block is worth the effort to keep some generality on one hand, and conceptually not mingle formats and operations too much in the execution unit (basically the format used in the execution unit itself should just be decided by the format port itself).

Secondly, just a personal nit on naming: I would probably name the new port src2_fmt or similar (since it's just the format of the second source operand port), as with src_fmt and add_fmt there are kind of two separate nomenclatures (the addend is also a source after all) - I see that it was not very clean to begin with, as one of the source operands used the destination format in the past. Consider renaming the format ports either

src01_fmt & src2_fmt (or just src and src2 without the 01); or
mul_fmt & add_fmt, but that clashes with the use of src and dst in the rest of the project.

(in case you want to go fully generic crazy, you could make each operand format independent and have src_fmt be an array of length 3 with one format for each operand - but that's not really something that's necessary at this point)

michael-platzer commented 5 months ago

I think an extra port on the multiformat FMA and the necessary changes for assigning it in the instantiating block is worth the effort to keep some generality on one hand, and conceptually not mingle formats and operations too much in the execution unit (basically the format used in the execution unit itself should just be decided by the format port itself).

Alright, I agree that this sounds like a reasonable tradeoff.

Secondly, just a personal nit on naming: I would probably name the new port src2_fmt or similar (since it's just the format of the second source operand port), as with src_fmt and add_fmt there are kind of two separate nomenclatures

Good point!

src01_fmt & src2_fmt (or just src and src2 without the 01);

I went with this approach and renamed the new port to src2_fmt_i. Updated my PR (#114) accordingly.

openhwgroup / cvfpu

Proposal: allow the FMA addend to be in either `src_fmt_i` or `dst_fmt_i` #113