periscop / clay

Clay, the Chunky Loop Alteration wizardrY
2 stars 5 forks source link

Second tile uses wrong dimension. #13

Closed ftynse closed 9 years ago

ftynse commented 9 years ago

For a loop nest with at least two loops, e.g. polynomial multiplication

for (i=0;i<=N-1;i++) {
  for (j=0;j<=N-1;j++) {
    Z[i + j] += 1;
  }
}

tiling the outer loop with tile([0,0,0],1,1,4,0); works as expected and produces

for (__ii0=0;__ii0<=floord(N-1,4);__ii0++) {
  for (i=4*__ii0;i<=min(N-1,4*__ii0+3);i++) {
    for (j=0;j<=N-1;j++) {
      Z[i + j] += 1;
    }
  }
}

However tiling the inner loop with tile([0,0,0,0],3,3,4,0); afterwards generates incorrect code with an extra loop and iterator reuse

for (;__ii0<=floord(N+3,4);__ii0++) {
  for (i=4*__ii0;i<=min(N+3,4*__ii0+3);i++) {
    for (__jj0=ceild(i-N-3,5);__jj0<=floord(N-1,5);__jj0++) {
      for (j=max(0,5*__jj0);j<=min(N-1,-i+5*__jj0+N+3);j++) {
        for (i=max(0,i-5*__jj0+j-4);i<=N-1;i++) {
          Z[i + j] += 1;
        }
      }
    }
  }
}
ftynse commented 9 years ago

c4 dimension gets removed somehow (0 == 0) by the second transformation

SCATTERING
11 14 9 2 0 1
# e/i| c1   c2   c3   c4   c5   c6   c7   c8   c9 |  i    j |  N |  1  
   0   -1    0    0    0    0    0    0    0    0    0    0    0    0    ## c1 == 0
   1    0   -4    0    1    0    0    0    0    0    0    0    0    0    ## -4*c2+c4 >= 0
   1    0    4    0   -1    0    0    0    0    0    0    0    0    3    ## 4*c2-c4+3 >= 0
   0    0    0   -1    0    0    0    0    0    0    0    0    0    0    ## c3 == 0
   0    0    0    0    0    0    0    0    0    0    0    0    0    0    ## 0 == 0
   0    0    0    0    0   -1    0    0    0    0    0    0    0    0    ## c5 == 0
   1    0    0    0    0    0   -5    0    1    0    0    0    0    0    ## -5*c6+c8 >= 0
   1    0    0    0   -1    0    5    0   -1    0    1    0    0    4    ## -c4+5*c6-c8+i+4 >= 0
   0    0    0    0    0    0    0   -1    0    0    0    0    0    0    ## c7 == 0
   0    0    0    0    0    0    0    0   -1    0    0    1    0    0    ## c8 == j
   0    0    0    0    0    0    0    0    0   -1    0    0    0    0    ## c9 == 0