Closed alexperrone closed 5 years ago
You'll find this paper interesting: http://escholarship.org/uc/item/9hf0m6w3. We have at least the spirit of the functions you cite in this work, though that doesn't mean we have them all in the Gunrock core. In general one of our pieces of future work is to allow matrix operations and Gunrock operations to be integrated into the same dataflow (right now it's pretty much all Gunrock operations).
Efficient SpM*SpM is in many ways still an interesting research problem on GPUs. This recent paper is nicely done (but on CPUs): http://epubs.siam.org/doi/10.1137/15M104253X
@Laurawly any comments?
That's funny. Re: speedups on triangles using matrix multiplication, we've tested avoiding multiplication altogether. Since the (unweighted) adjacency matrix is binary, @LeonHeTheFirst realized for each entry in a dot-product of a row and column in A, we can perform a logical check if corresponding entries are non-zero, and if so, increment by 1 (i.e. no multiplication). That led to a speedup of ~10x and made it comparable to set-intersection methods. We weren't using sparse matrices then, though, but it seems to be a good idea to leverage the fact that adjacency matrix is binary. This relates to the point
(2) avoid multiplications where A is known to be zero a priori
in the first paper you cited.
For Gunrock, we currently just input CSR as our input format, so we'd need special handling to deal with a bit binary array/matrix. But it would make sense! The EmptyHeaded work and I believe Enterprise both used bit arrays.
If I remember correctly, @ctcyang used csrgemm from CUSPARSE lib to implement our matrix-multiplication-based TC instead of his own optimized matrix multiplication (plz refer to the implementation section of the first paper John cited.) Also in the experiment section, we also mentioned that the profiler tells us that SpGEMM from cusparse is the bottleneck of our implementation since it's doing unnecessary work when doing matrix multiplication. So it's possible that using an optimized operator such as using bitmap methods, TC can be faster using matrix multiplication. And like John mentioned, in Gunrock, because we take in csr inputs, it would need some preprocessing to convert them to bit binary arrays.
This will be a better fit for GraphBLAS; https://github.com/gunrock/graphblast/issues/2 Triangle Counting is one of the possible future apps. I don't believe matrix based computations and operations fit Gunrock's model, they can however be added as a utility if need be -- but that's why we have GraphBLAS.
Couldn't find a function or example of sparse-matrix sparse-matrix multiplication in gunrock or external dependencies (moderngpu and cub). Does an example exist for gunrock? Closest I could find in the codebase was sparse-matrix vector.
There are two applications for this, the second one being a more specialized function: (1) relational algebra on the adjacency matrix to compute various projections of the graph, (2) computing the number of triangles each edge occurs in by computing A transpose(A) where A is the adjacency matrix, since an edge i-j occurs in number of triangles equal to dot-product of row i and column j of adjacency matrix. This has an efficient CPU implementation in the function
tcrossprod
in the Csparse library, for example. From their documentation (where `%%indicates matrix multiplication and
t(.)` is the transpose):Search for
tcrossprod
here: https://cran.r-project.org/web/packages/Matrix/Matrix.pdf