Closed mattfel1 closed 6 years ago
This would be helpful for me too
Implemented. See the BlockReduce1D and BlockReduce2D unit tests. We can figure out how to make it prettier later if we want.
For now, whatever range you accumulate into for the accumulator is the same range it reads from in the result tile:
If the Partial tile has different dimensions, the read addresses will wrap around in hw and you will probably get XXX data in scala. I think we mainly just need to accumulate starting at the origin in most cases so this should be fine.
Is there already an issue for this? It would be nice to be able to do something like
In cases where we try to save area by reusing hardware to do convolutions and the layer dimensions are different, we could use the same large memory but only reduce over a portion of it since we know this portion is all we will care to store into DRAM.