pnnl / COMET

Other
37 stars 7 forks source link

Elementwise Multiplication Code Generation Redundancy and Performance Issue #44

Open johnpzh opened 1 year ago

johnpzh commented 1 year ago

When generating code from Index Tree dialect to SCF dialect (in IndexTreeToSCF.cpp) for the numeric phase of Elementwise Multiplication (Eltwise), some redundant code is generated.

Current generated numeric Eltwise is as follows.

void numeric_eltwise(Matrix A[M,N],
                     Matrix B[M,N],
                     Matrix C[M,N] /* output */) {

  bool is_visited[N] = {false}; /// is_visited is not from Index Tree.

  for (int i_idx = 0; i_idx < M; ++i_idx) {
    int rowptr = C.rowptr[i_idx];

    double V[N] = {0}; /// dense vector V;

    /// Copy A[i,] to V
    for (int j_loc = A.rowptr[i_idx]; j_loc < A.rowptr[i_idx + 1]; ++j_loc) {
      int jA_idx = A.col[j_loc];
      V[jA_idx] = A.val[j_loc];
    }

    for (int j_loc = B.rowptr[i_idx]; j_loc < B.rowptr[i_idx + 1]; ++j_loc) {
      int j_idx = B.col[j_loc];
      if (!is_visited[j_idx]) {
        A_val = V[j_idx];
        B_val = B.val[j_loc];
        val = A_val * B_val;
        W_data[j_idx] = val;
        is_visited[j_idx] = true;
        C.col[rowptr] = j_idx;
        rowptr += 1;
      }
      else {
        A_val = A[j_idx];
        B_val = B.val[j_loc];
        val = A_val * B_val;
        W_data[j_idx] = val;
      }
    }

    sort(C.col, C.rowptr[i_idx], C.rowptr[i_idx + 1]);

    for (int j_loc = C.rowptr[i_idx]; j_loc < C.rowptr[i_idx + 1]; ++j_loc) {
      int j_idx = C.col[j_loc];
      C.val[j_idx] = W_data[j_idx];
      is_visited[j_idx] = false;
    }
  }
}

Some known issues are as follows, and there might be others.

  1. The is_visited array is unnecessary because Eltwise only goes through every row once.
  2. The if-else checking condition is unnecessary, but the if body should remain.
  3. The dense array V seems redundant because there is already a dense array W_data (i.e., the workspace).