Possible speed up for Sparse Block Computation

Hello @tchaton

Thanks for the link, indeed it's the same kind of kernel!

CuSparse was actually designed for very sparse matrices, and not really for NNs, that explains the slowness. For arbitrary sparse matrices, there is sputnik by google that is very good too I think (lots of very smart tricks to improve speed, described here ). For block sparse, there is OpenAI kernel library, but not available for PyTorch, and blocksparse, but I did not test it yet.

As you probably know, it's much easier to get good performance with block sparsity, because of locality of data, and it should be possible to approach dense performance, whereas it's almost impossible (on GPU at least) for arbitrary sparsity.

I had a try at implementing block sparse linear layers with pytorch_block_sparse , but I am not a specialist, so it is just as fast as dense when sparsity > 60%, but it could be significantly faster I suppose.

I hope you will find nn_pruning useful ! Best,

François

huggingface / nn_pruning

Possible speed up for Sparse Block Computation #2