huggingface / nn_pruning

Prune a model while finetuning or training.
Apache License 2.0
394 stars 58 forks source link

Possible speed up for Sparse Block Computation #2

Closed tchaton closed 3 years ago

tchaton commented 3 years ago

Dear @madlag,

Matthias Fey, PyTorch Geometric main author implemented this: https://github.com/rusty1s/pytorch_sparse/blob/master/csrc/cuda/spmm_cuda.cu#L11

After benchmarking, it confirmed it was faster than CuSparse. I think this algorithm could be adapted for Block Sparse to get some additional speed up.

From Matthias,

basically the same, just that you additionally parallelize over the block dimension
depends on the density of the sparse matrix I guess

Just an idea :)

Best, T.C

madlag commented 3 years ago

Hello @tchaton

Thanks for the link, indeed it's the same kind of kernel!

CuSparse was actually designed for very sparse matrices, and not really for NNs, that explains the slowness. For arbitrary sparse matrices, there is sputnik by google that is very good too I think (lots of very smart tricks to improve speed, described here ). For block sparse, there is OpenAI kernel library, but not available for PyTorch, and blocksparse, but I did not test it yet.

As you probably know, it's much easier to get good performance with block sparsity, because of locality of data, and it should be possible to approach dense performance, whereas it's almost impossible (on GPU at least) for arbitrary sparsity.

I had a try at implementing block sparse linear layers with pytorch_block_sparse , but I am not a specialist, so it is just as fast as dense when sparsity > 60%, but it could be significantly faster I suppose.

I hope you will find nn_pruning useful ! Best,

François