gunrock / gunrock

Programmable CUDA/C++ GPU Graph Analytics
https://gunrock.github.io/gunrock/
Apache License 2.0
972 stars 200 forks source link

Sparse-matrix by sparse-matrix multiplication #254

Closed alexperrone closed 5 years ago

alexperrone commented 7 years ago

Couldn't find a function or example of sparse-matrix sparse-matrix multiplication in gunrock or external dependencies (moderngpu and cub). Does an example exist for gunrock? Closest I could find in the codebase was sparse-matrix vector.

There are two applications for this, the second one being a more specialized function: (1) relational algebra on the adjacency matrix to compute various projections of the graph, (2) computing the number of triangles each edge occurs in by computing A transpose(A) where A is the adjacency matrix, since an edge i-j occurs in number of triangles equal to dot-product of row i and column j of adjacency matrix. This has an efficient CPU implementation in the function tcrossprod in the Csparse library, for example. From their documentation (where `%%indicates matrix multiplication andt(.)` is the transpose):

The functions crossprod and tcrossprod are matrix products or “cross products”, ideally implemented efficiently without computing t(.)’s unnecessarily. [...] tcrossprod() takes the cross-product of the transpose of a matrix.tcrossprod(x) is formally equivalent to, but faster than, the call x %*% t(x), and so is tcrossprod(x, y) instead of x %*% t(y).

Search for tcrossprod here: https://cran.r-project.org/web/packages/Matrix/Matrix.pdf

jowens commented 7 years ago

You'll find this paper interesting: http://escholarship.org/uc/item/9hf0m6w3. We have at least the spirit of the functions you cite in this work, though that doesn't mean we have them all in the Gunrock core. In general one of our pieces of future work is to allow matrix operations and Gunrock operations to be integrated into the same dataflow (right now it's pretty much all Gunrock operations).

Efficient SpM*SpM is in many ways still an interesting research problem on GPUs. This recent paper is nicely done (but on CPUs): http://epubs.siam.org/doi/10.1137/15M104253X

@Laurawly any comments?

alexperrone commented 7 years ago

That's funny. Re: speedups on triangles using matrix multiplication, we've tested avoiding multiplication altogether. Since the (unweighted) adjacency matrix is binary, @LeonHeTheFirst realized for each entry in a dot-product of a row and column in A, we can perform a logical check if corresponding entries are non-zero, and if so, increment by 1 (i.e. no multiplication). That led to a speedup of ~10x and made it comparable to set-intersection methods. We weren't using sparse matrices then, though, but it seems to be a good idea to leverage the fact that adjacency matrix is binary. This relates to the point

(2) avoid multiplications where A is known to be zero a priori

in the first paper you cited.

jowens commented 7 years ago

For Gunrock, we currently just input CSR as our input format, so we'd need special handling to deal with a bit binary array/matrix. But it would make sense! The EmptyHeaded work and I believe Enterprise both used bit arrays.

Laurawly commented 7 years ago

If I remember correctly, @ctcyang used csrgemm from CUSPARSE lib to implement our matrix-multiplication-based TC instead of his own optimized matrix multiplication (plz refer to the implementation section of the first paper John cited.) Also in the experiment section, we also mentioned that the profiler tells us that SpGEMM from cusparse is the bottleneck of our implementation since it's doing unnecessary work when doing matrix multiplication. So it's possible that using an optimized operator such as using bitmap methods, TC can be faster using matrix multiplication. And like John mentioned, in Gunrock, because we take in csr inputs, it would need some preprocessing to convert them to bit binary arrays.

neoblizz commented 5 years ago

This will be a better fit for GraphBLAS; https://github.com/gunrock/graphblast/issues/2 Triangle Counting is one of the possible future apps. I don't believe matrix based computations and operations fit Gunrock's model, they can however be added as a utility if need be -- but that's why we have GraphBLAS.