microsoft / antares

Antares: an automatic engine for multi-platform kernel generation and optimization. Supporting CPU, CUDA, ROCm, DirectX12, GraphCore, SYCL for CPU/GPU, OpenCL for AMD/NVIDIA, Android CPU/GPU backends.
Other
435 stars 45 forks source link

how can antares surport loop which index doesn't start with 0 #359

Open lethean1 opened 1 year ago

lethean1 commented 1 year ago

I'm trying to use antares to express the loop like this:

for(int64_t i=1; i < domain_size; ++i) {
      for(int64_t j=1; j < domain_size; ++j) {
        out(i,j) = flx(i-1,j) - flx(i,j) + fly(i,j-1) - fly(i,j);
      }
 }

and the loop indeces i and j don't start with 0. flx[N-1,M].when([-1 + N >= 0], 0.0) is not a possible solution as it violates the semantics. Any help to express this loop is appreciated !!!

sebastienwood commented 1 year ago

Decrease it by 1 maybe ?

for(int64_t i=0; i <= domain_size; ++i) {
      for(int64_t j=0; j <= domain_size; ++j) {
        out(i-1,j-1) = flx(i-2,j-1) - flx(i-1,j-1) + fly(i-1,j-2) - fly(i-1,j-1);
      }
 }
ghostplant commented 1 year ago

Hi, does this work for you?

out[N, M] = (flx[N - 1, M] - flx[N, M] + fly[N, M - 1] - fly[N, M]).when([N >= 1, M >= 1], const(0, dtype=flx.dtype()), merge_op=`any`) where N in {domain_size}, M in {domain_size}
lethean1 commented 1 year ago

Decrease it by 1 maybe ?

for(int64_t i=0; i <= domain_size; ++i) {
      for(int64_t j=0; j <= domain_size; ++j) {
        out(i-1,j-1) = flx(i-2,j-1) - flx(i-1,j-1) + fly(i-1,j-2) - fly(i-1,j-1);
      }
 }

It seems that out(i-1,j-1) is not legal in antares, it only allows form like this : out(i,j).

lethean1 commented 1 year ago

Hi, does this work for you?

out[N, M] = (flx[N - 1, M] - flx[N, M] + fly[N, M - 1] - fly[N, M]).when([N >= 1, M >= 1], const(0, dtype=flx.dtype()), merge_op=`any`) where N in {domain_size}, M in {domain_size}

This can work, but will result in the calculation of redundancy. Is there any other way?

ghostplant commented 1 year ago

Is out[N, 0] and out[0, M] not used by following operations? If they're useful, they are not regarded as redundant computation. If they are not useful, why don't you even remove those unsed space for that, making it:

out[N, M] = flx[N, M + 1] - flx[N + 1, M + 1] + fly[N + 1, M] - fly[N + 1, M + 1] where N in {domain_size - 1}, M in {domain_size - 1}