jbush001 / NyuziProcessor

GPGPU microprocessor architecture
Apache License 2.0
1.97k stars 349 forks source link

Optimize hardware multiplier #73

Open jbush001 opened 7 years ago

jbush001 commented 7 years ago

Currently integer and floating point multiplication occur in one stage (fp_execute_stage2) using the '*' Verilog operator. This is the critical path when synthesizing for silicon. However, three stages are reserved in the pipeline for it. Create a proper multi-stage multiplier that uses modified Booth encoding and a Wallace tree to accumulate partial products.

https://web.stanford.edu/class/archive/ee/ee371/ee371.1066/lectures/lect_05.2up.pdf

Or, eliminate booth encoding and use a single row of 4:2 compressors:

http://www.acsel-lab.com/Publications/Papers/38-booth-para-multi-EL93.pdf

jbush001 commented 6 years ago

Some experimental WIP on first part, not quite working yet:

https://gist.github.com/jbush001/ce5099eb65e76f269d237d24f091026e https://gist.github.com/jbush001/de490ff53d31f9a8f5dc17513dba58c6 https://gist.github.com/jbush001/59a82882a0b3b60dc3dcdf8f1088a138