Unrolling a stencil in 2 or even 3 dimensions (+ inlining) might have performance benefits over unrolling into one dimension (+ inlining).
Example in 2D:
Assume for every grid point (i,j) the Laplacians for the 9 surrounding gridpoints have to be calculated (i-1) to (i+1) and (j-1) to (j+1).
We now compare no unrolling to unrolling in one direction by 4 to unrolling in 2 dimensions by 2. We always look at how many Laplacians have to be calculated for 4 grid points.
No Unrolling: 4*9 = 36
Unrolling in one dimension by 4: 9 + 3*3 = 18
Unrolling in two dimensions by 2: 9 + 3 + 4 = 16
Here the unrolling in two dimensions seems to be the best choice for performance
Unrolling a stencil in 2 or even 3 dimensions (+ inlining) might have performance benefits over unrolling into one dimension (+ inlining).
Example in 2D:
Here the unrolling in two dimensions seems to be the best choice for performance