I'm wondering how tiled matrix multiplication works in Gemmini.
I think there're two ways to perform tiled matrix multiplication: do in software level, do in hardware level.
But I'm not sure whether my understanding is correct.
1. Software Level
In gemmini.h file located at software/gemmini-rocc-tests/include,
I found out that sp_tiled_matmul_ws called multiple times.
As far as I know, sp_tiled_matmul_ws contains single bunch of RoCC instructions which performs matrix multiplication in Gemmini.
So, I thought that tiled_matmul_auto function performs tiled matrix multiplication by tiling matrix in software level.
2. Hardware Level
Referring to the description provided by Gemmini repo, there're hardware module named LoopMatmul which performs automatically tile and unroll large matrix multiplication.
So I thought that, Gemmini natively supports hardware level tiled matrix multiplication.
First of all, is my understanding correct?
Secondly, if my understanding is correct, why tiled_matmul_auto performs tiled matrix multiplication in software level, not in hardware level?
0. What this question about
I'm wondering how tiled matrix multiplication works in Gemmini. I think there're two ways to perform tiled matrix multiplication: do in software level, do in hardware level. But I'm not sure whether my understanding is correct.
1. Software Level
In gemmini.h file located at software/gemmini-rocc-tests/include, I found out that
sp_tiled_matmul_ws
called multiple times.As far as I know,
sp_tiled_matmul_ws
contains single bunch of RoCC instructions which performs matrix multiplication in Gemmini.So, I thought that tiled_matmul_auto function performs tiled matrix multiplication by tiling matrix in software level.
2. Hardware Level
Referring to the description provided by Gemmini repo, there're hardware module named LoopMatmul which performs automatically tile and unroll large matrix multiplication.
So I thought that, Gemmini natively supports hardware level tiled matrix multiplication.
First of all, is my understanding correct? Secondly, if my understanding is correct, why
tiled_matmul_auto
performs tiled matrix multiplication in software level, not in hardware level?