One of our main targets is to find out where our efficiency ends comparing with different BLAS libraries.
For that, we need to increase matrix sizes and measure performance until we hit this turning point.
My suggested approach is to increase matrix size by power of two so we could implement a divide and conquer approach, like this video: https://www.youtube.com/watch?v=auw2Nm6ZOqI
Basically you will consider each matrix as a matrix block and will stop until you hit a baseline: 2x2 size.
After this is done, we need performance results from 2x2 until 1024x1024.
So, in short:
[x] Implement divide and conquer
[ ] Create benchmark (libfsmc and contestant library)
[ ] Generate performance results from 2->1024
[ ] Compare performance results with another library
One of our main targets is to find out where our efficiency ends comparing with different BLAS libraries.
For that, we need to increase matrix sizes and measure performance until we hit this turning point.
My suggested approach is to increase matrix size by power of two so we could implement a divide and conquer approach, like this video: https://www.youtube.com/watch?v=auw2Nm6ZOqI
Basically you will consider each matrix as a matrix block and will stop until you hit a baseline: 2x2 size.
After this is done, we need performance results from 2x2 until 1024x1024.
So, in short: