tensor-compiler / taco

The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs
http://tensor-compiler.org
Other
1.23k stars 186 forks source link

Compilation error for TTM with CSF,CSC as input formats and CSF output #522

Open amirsh opened 2 years ago

amirsh commented 2 years ago

On the master branch (as well as the web interface), for the TTM example, the mentioned data formats lead to the following compilation error:

/var/folders/43/fyr6y1h116q7yw4m8vczbq440000gq/T/taco_tmp_8eTr18/wz1tmu3amdu1.c:195:11: error: use of undeclared identifier 'pA853_begin'; did you mean 'pA852_begin'?
      if (pA853_begin < i154A85) {
          ^~~~~~~~~~~
          pA852_begin
/var/folders/43/fyr6y1h116q7yw4m8vczbq440000gq/T/taco_tmp_8eTr18/wz1tmu3amdu1.c:157:13: note: 'pA852_begin' declared here
    int32_t pA852_begin = i153A85;
            ^
1 error generated.
rohany commented 2 years ago

Can you share the link of the web interface that led to the error (it includes the schedules and formats). Trying it myself, it looks like this particular case works.

amirsh commented 2 years ago

@rohany I believe this is what you are looking for? http://tensor-compiler.org/codegen.html?expr=A(i,j,k)%20=%20B(i,j,l)%20*%20C(k,%20l)&format=A:sss:0,1,2;B:sss:0,1,2;C:ds:1,0

If you take a look at the generated assembly code:

      if (pA3_begin < kA) {
        if (A2_crd_size <= jA) {
          A2_crd = (int32_t*)realloc(A2_crd, sizeof(int32_t) * (A2_crd_size * 2));
          A2_crd_size *= 2;
        }
        A2_crd[jA] = j;
        jA++;
      }

The variable pA3_begin is not defined.

rohany commented 2 years ago

Thanks, I see the problem now.

stephenchouca commented 2 years ago

This seems to be caused by the fact that the expression scatters into the result, which is not directly supported when the output format is sparse. By (manually) introducing a workspace, you can avoid scattering into the sparse output, which I believe should resolve the issue: http://tensor-compiler.org/codegen.html?expr=A(i,j,k)%20=%20B(i,j,l)%20*%20C(k,%20l)&format=A:sss:0,1,2;B:sss:0,1,2;C:ds:1,0&sched=precompute:B(i,j,l)%20*%20C(k,l):k:k

A longer-term fix for bugs like this though would probably require some sort of mechanism that detects scatters into sparse outputs and that automatically inserts workspaces to eliminate the scatter.