kokkos / kokkos-kernels

Kokkos C++ Performance Portability Programming Ecosystem: Math Kernels - Provides BLAS, Sparse BLAS and Graph Kernels
Other
311 stars 98 forks source link

boolean spgemm: can I avoid numeric phase? #422

Open cjain7 opened 5 years ago

cjain7 commented 5 years ago

I'm using spgemm for an application in biological data analysis. In my application, I've boolean matrices, and I'm only interested in boolean matrix multiplication. From the output, I only need the non-zero pattern, i.e., the column entries and the row-map, but I don't care about non-zero values. I'm using kokkos-kernels as an external library, and calling spgemm routines using the API recommended by you.

I read your spgemm paper this week, where you describe the role of symbolic and numeric phases. My question is following: As I don't care about output values, can I avoid the numeric phase, and just use symbolic phase? I ask this because majority of time (>90%) is being consumed in the numeric phase in my benchmarks. If possible, please let me know a convenient way to go about it.

The current output (using the API) of symbolic phase is row-map, but if I could also get the column indices (i.e., entries), that would be fantastic. (I am using this library with openmp support on Intel CPUs)

srajama1 commented 5 years ago

We do have the column indices info in symbolic phase, but don't store it. We just store the column map. It will be a small change. Will you be interested in changing it and contributing back ?

cjain7 commented 5 years ago

Thanks for responding. I too realized this when I read through the code to have a better understanding of the symbolic phase (both uncompressed and compressed modes).

Storing column indices would make symbolic phase more expensive I assume (as we don't have row-map in the beginning), but unclear to me whether we're sure to get any advantage over the current version that uses symbolic + numeric. It would be good to know your thoughts about this. Putting this in other words, should we really expect boolean spgemm to be much faster than integer spgemm?

If you could give me more advice about what changes need to be done, I would be happy to give it a shot.

srajama1 commented 5 years ago

Yes, it is a chicken and egg thing. Having the rowmap made life simpler, so we did it this way. There could be some advantage in finding this out during symbolic, but it would be small. For example, we do one phase for triangle counting, but we have an additional filtering in triangle counting in addition to multiply. This results in advantage that was worth it do one phase.

Char-Aznable commented 2 years ago

I'm interested in using spmv snd spgemm for binary matrix as well. It seems to me the right thing to do would be to specialize crsmatrix for bool to completely avoid storing the values of one and overload the kernel for it.