harvard-acc / gem5-aladdin

End-to-end SoC simulation: integrating the gem5 system simulator with the Aladdin accelerator simulator.
BSD 3-Clause "New" or "Revised" License
210 stars 59 forks source link

systolic-array: Fix bugs for small PE configurations. #45

Closed yaoyuannnn closed 2 years ago

yaoyuannnn commented 2 years ago

To compute a large convolution, we tile the operation into two levels of "folds". First, the kernels are tiled into (numKerns / peArrayCols) weight folds (as each PE column will finish a kernel), and within each weight fold, each output feature map is further tiled into (outputRows * outputCols / peArrayRows) output folds (as each PE row is responsible for producing a output channel). Each PE row has a commit unit that collects finished results from the row and writes the data to the output scratchpad. The current write size is always set to the line size of the scratchpad, which is not correct for a small PE column size where the entire row doesn't form a line. This commit fixes this by taking into account when peArrayCols is smaller than elemsInLine.

Also, this fixes the incorrect initial tensor iterator place for the commit units.

yaoyuannnn commented 2 years ago

This fixes the hang in small PE configurations (#43). Unit tests will be added in next PRs.