Open nitish2112 opened 7 years ago
Dear @nitish2112,
I'm so sorry, the current implementation does produce incorrect code for CSR matrix multiplication. We've worked out how to fix it in theory and I'll implement it first thing after thanksgiving break. (I look forward to code again!) I thought we had an issue on this, but I cannot find it so I'll update the name of this issue.
This is roughly the code we want for compute:
double* restrict w = malloc(A2_size*sizeof(double));
memset(w, 0, A2_size*sizeof(double));
for (int i = 0; i < B0_size; i++) {
for (int pB = B_pos[i]; pB < B_pos[i+1]; pB++) {
int k = B_idx[pB];
for (int pC = C_pos[k]; pC < C_pos[k+1]; pC++) {
int j = C_idx[pC];
w[j] += B_vals[pB] * C_vals[pC];
}
}
for (int pA = A_pos[i]; pA < A_pos[i+1]; pA++) {
int j = A_idx[pA];
A_vals[pA] = w[j];
w[j] = 0.0;
}
}
free(w);
For assembly:
bool* restrict w = malloc(A1_size*sizeof(bool));
memset(w, 0, A1_size*sizeof(bool));
int* restrict wlist = malloc(A1_size*sizeof(int));
int w_size = 0;
int A_idx_size = 1048576;
A_pos = malloc((m+1)*sizeof(int));
A_idx = malloc(A_idx_size*sizeof(int));
A_pos[0] = 0;
for (int i = 0; i < B0_size; i++) {
for (int pB = B_pos[i]; pB < B_pos[i+1]; pB++) {
int k = B_idx[pB];
for (int pC = C_pos[k]; pC < C_pos[k+1]; pC++) {
int j = C_idx[pC];
if (!w[j]) {
wlist[w_size++] = j;
w[j] = true;
}
}
}
// Make sure A_idx is large enough
if (A_idx_size < (A_pos[i]+w_size)) {
A_idx_size *= 2;
A_idx = realloc(A_idx, A_idx_size*sizeof(int));
}
// Sort row
qsort(wlist, w_size, sizeof(int), cmp);
for (int pwlist = 0; pwlist < w_size; pwlist++) {
int j = wlist[pwlist];
w[j] = false;
A_idx[A_pos[i] + pwlist] = j;
}
A_pos[i+1] = A_pos[i] + w_size;
w_size = 0;
}
A_vals = malloc(A_pos[m] * sizeof(double));
free(w);
free(wlist);
Hi Fred, Has this bug been resolved?
Hi, thanks for checking in. I'm sorry, but it hasn't been implemented yet since I diverted by small side-tasks, but I'm working on it and there's nothing blocking it so it should be in by the end of the month/early next month. I hope this doesn't negatively impact you too much.
Hi Fred,
Thanks for the prompt reply. Yes, that is perfectly fine.
Best, Nitish
Dear @nitish2112,
I'm so sorry, the current implementation does produce incorrect code for CSR matrix multiplication. We've worked out how to fix it in theory and I'll implement it first thing after thanksgiving break. (I look forward to code again!) I thought we had an issue on this, but I cannot find it so I'll update the name of this issue.
This is roughly the code we want for compute:
double* restrict w = malloc(A2_size*sizeof(double)); memset(w, 0, A2_size*sizeof(double)); for (int i = 0; i < B0_size; i++) { for (int pB = B_pos[i]; pB < B_pos[i+1]; pB++) { int k = B_idx[pB]; for (int pC = C_pos[k]; pC < C_pos[k+1]; pC++) { int j = C_idx[pC]; w[j] += B_vals[pB] * C_vals[pC]; } } for (int pA = A_pos[i]; pA < A_pos[i+1]; pA++) { int j = A_idx[pA]; A_vals[pA] = w[j]; w[j] = 0.0; } } free(w);
For assembly:
bool* restrict w = malloc(A1_size*sizeof(bool)); memset(w, 0, A1_size*sizeof(bool)); int* restrict wlist = malloc(A1_size*sizeof(int)); int w_size = 0; int A_idx_size = 1048576; A_pos = malloc((m+1)*sizeof(int)); A_idx = malloc(A_idx_size*sizeof(int)); A_pos[0] = 0; for (int i = 0; i < B0_size; i++) { for (int pB = B_pos[i]; pB < B_pos[i+1]; pB++) { int k = B_idx[pB]; for (int pC = C_pos[k]; pC < C_pos[k+1]; pC++) { int j = C_idx[pC]; if (!w[j]) { wlist[w_size++] = j; w[j] = true; } } } // Make sure A_idx is large enough if (A_idx_size < (A_pos[i]+w_size)) { A_idx_size *= 2; A_idx = realloc(A_idx, A_idx_size*sizeof(int)); } // Sort row qsort(wlist, w_size, sizeof(int), cmp); for (int pwlist = 0; pwlist < w_size; pwlist++) { int j = wlist[pwlist]; w[j] = false; A_idx[A_pos[i] + pwlist] = j; } A_pos[i+1] = A_pos[i] + w_size; w_size = 0; } A_vals = malloc(A_pos[m] * sizeof(double)); free(w); free(wlist);
This hash-based approach may have compatibility issue with the existing merge-based approach for sparse vector add. I suggest to use the k-way merging, where k is the nnz of B(i,:). For CSR, You can use the sparse vector add code of A(i,:) = B(i,k_0) C(k_0,:) + B(i,k_1) C(k_1,:) + ... + B(i,k_nz_i) * C(k_nz_i,:)
Hi,
When I try to compile a matrix multiplication A = B*C where A, B and C all three are CSR, taco doesn't seem to produce the correct result.
When I compile it using taco and run:
which is the wrong result.
Also, when I try to set A as CSR, B as CSR and C as CSC, then I get the following result
Here as we can see A is definitely not is CSR format as there are still zeros in the vals array.
Is there something that I am doing wrong or these are bugs?
Thanks,
Best Regards, Nitish