Closed tchaton closed 3 years ago
Hello @tchaton
Thanks for the link, indeed it's the same kind of kernel!
CuSparse was actually designed for very sparse matrices, and not really for NNs, that explains the slowness. For arbitrary sparse matrices, there is sputnik by google that is very good too I think (lots of very smart tricks to improve speed, described here ). For block sparse, there is OpenAI kernel library, but not available for PyTorch, and blocksparse, but I did not test it yet.
As you probably know, it's much easier to get good performance with block sparsity, because of locality of data, and it should be possible to approach dense performance, whereas it's almost impossible (on GPU at least) for arbitrary sparsity.
I had a try at implementing block sparse linear layers with pytorch_block_sparse , but I am not a specialist, so it is just as fast as dense when sparsity > 60%, but it could be significantly faster I suppose.
I hope you will find nn_pruning useful ! Best,
François
Dear @madlag,
Matthias Fey, PyTorch Geometric main author implemented this: https://github.com/rusty1s/pytorch_sparse/blob/master/csrc/cuda/spmm_cuda.cu#L11
After benchmarking, it confirmed it was faster than
CuSparse
. I think this algorithm could be adapted for Block Sparse to get some additional speed up.From Matthias,
Just an idea :)
Best, T.C