In DETAILED_ACCESS_REPORT.csv, I find compute cycles could be regarded as the runtime of each layer!
Then I guess that the problem may attribute to the scale of systolic array! When I use 32 x 32 systolic array, the 16/32/64/128 KiByte sram is enough to support the calculation of systolic array with no stalls. In this thought, if I use a large systolic array, the calculation process may meet stalls and
Then I try a large systolic array (4096 x 4096) and use the following sram schemes:
@AnandS09 @jmjos @boukhary123 @ritikraj7 Okay, I am sorry to trouble you! When I use the following memory options,
I find no change in
compute cycles
!In DETAILED_ACCESS_REPORT.csv, I find
compute cycles
could be regarded as theruntime of each layer
!Then I guess that the problem may attribute to the scale of systolic array! When I use 32 x 32 systolic array, the 16/32/64/128 KiByte sram is enough to support the calculation of systolic array with no stalls. In this thought, if I use a large systolic array, the calculation process may meet stalls and
Then I try a large systolic array (4096 x 4096) and use the following sram schemes:
Unluckliy, the
compute cycles
of all of these three schemes are no difference, that is they are share the same values.Please spare your precious time, thanks very much!