Open gussmith23 opened 7 months ago
Thanks @gussmith23 !
The specific cases that would be super helpful for processor design are: 1) +-(32bx32b)+-32b->32b 2) 32bx32b->64b 3) 32bx32b->64b
4) +-(64bx64b)+-64b->64b 5) 64bx64b->64b 6) 64bx64b->128b
Not sure how the output bits can affect DSP inference! In ASIC, it is a substantial savings (10s of percents) to drop the upper bits. Could be free on FPGA? Interesting either way?
Any realistic number of pipeline stages is fine, in ASIC we typically see 3+-1
I have more advance usages I'd love support for, but this is a great place to start!
This was mentioned by @dpetrisko.
Apparently Vivado is failing to map wide FMAs to DSPs efficiently.
Lakeroad alone probably can't do this -- once a solver query needs to figure out that some combination of bvmuls == one wide bvmul, they all seem to choke. There may be solver tricks to do this (reasoning about multiplies is a known hard problem; I would think solvers like cvc5 would have done research on this). However, there's an even more obvious way around this: use equality saturation (ie Churchroad) to block up the FMA via rewrites, and then run Lakeroad synthesis on the smaller FMAs that result. Assuming the smaller FMAs are sized to fit on a single DSP, then this should work great.
Subtasks: