Closed aprokop closed 6 years ago
Thanks @aprokop for catching and suggesting that. I will work on this.
Thanks @mndevec! I guess there should be two options: one where the graph of the product is the product of the graphs, and second where the graph of the product is reduced to ignore known zero values (there may still be unknown zeros as a result of calculations). MueLu is actually interested in both for different scenarios.
@aprokop @srajama1
Going deeper in this, I realize that this might be a bit tricky to do.
Basically, the issue is as follows: spgemm symbolic functions calculates the sizes of the rows. However, it does not really store the structure.
In the numeric, algorithm works with column-id, and value pairs. Numeric phase returns both column ids and values. This can be avoided, but it does not really optimize the algorithm much, because at any certain point it still has to store id and value pairs for accumulation.
If a value is 0 on the left hand side matrix, we can avoid all multiplications. However, the structure of those rows still need to be inserted. If the algorithm avoids the traversal of those rows on the right hand side matrix, it might produce rows with as follows:
Row: col1 col2 col3 0 0 0 val1 val2 val3 0 0 0
Would having such a matrix be okay? Basically, the current method cannot avoid this traversals so that it can return the ids of the columns with 0 values as below. col1 col2 col3 col4 col5 col6 val1 val2 val3 0 0 0
I assume returning rows with missing column ids is not okay. But if it is, we can easily return these rows with missing values. Otherwise, to skip 0 values, we might need to change the algorithm and structure a bit more.
I've been running some experiments in MueLu, and noticed that matrix-matrix multiplication time does not change when we use a filtered matrix with OpenMp node.
My assumption is that the currently implemented spgemm in kokkos-kernels does not have any shortcuts or optimization when the left matrix may have multiple zero elements. In the serial Tpetra version, the code checks for zeros in A and then skips fetching corresponding rows of B, thus significantly improving performance.
My question is: would something like that be possible in the threaded spgemm? This could significantly help with some applications that use multigrid, particularly Nalu when used with high geometric anisotropy.