gussmith23 / churchroad

MIT License
9 stars 1 forks source link

Good test case: Mapping wider FMAs #52

Open gussmith23 opened 7 months ago

gussmith23 commented 7 months ago

This was mentioned by @dpetrisko.

Apparently Vivado is failing to map wide FMAs to DSPs efficiently.

Lakeroad alone probably can't do this -- once a solver query needs to figure out that some combination of bvmuls == one wide bvmul, they all seem to choke. There may be solver tricks to do this (reasoning about multiplies is a known hard problem; I would think solvers like cvc5 would have done research on this). However, there's an even more obvious way around this: use equality saturation (ie Churchroad) to block up the FMA via rewrites, and then run Lakeroad synthesis on the smaller FMAs that result. Assuming the smaller FMAs are sized to fit on a single DSP, then this should work great.

Subtasks:

dpetrisko commented 7 months ago

Thanks @gussmith23 !

The specific cases that would be super helpful for processor design are: 1) +-(32bx32b)+-32b->32b 2) 32bx32b->64b 3) 32bx32b->64b

4) +-(64bx64b)+-64b->64b 5) 64bx64b->64b 6) 64bx64b->128b

Not sure how the output bits can affect DSP inference! In ASIC, it is a substantial savings (10s of percents) to drop the upper bits. Could be free on FPGA? Interesting either way?

Any realistic number of pipeline stages is fine, in ASIC we typically see 3+-1

I have more advance usages I'd love support for, but this is a great place to start!