neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs
https://neuralmagic.com/deepsparse/
Other
3.01k stars 176 forks source link

Paper: New algorithm for pruning #1329

Closed gottlike closed 6 months ago

gottlike commented 1 year ago

Just a FYI of a new discovery that should be super relevant for you guys: https://twitter.com/aaquib_syed1/status/1714386165237776653

gottlike commented 1 year ago

Now all we need is combining this with 1-bit transformers and we can run everything on a cheap phone 😁: https://arxiv.org/abs/2310.11453

SuperSecureHuman commented 1 year ago

Another pruning method!

https://arxiv.org/pdf/2306.11695

mgoin commented 6 months ago

Thanks @gottlike and @SuperSecureHuman , we have wanda implemented in SparseML https://github.com/neuralmagic/sparseml/blob/3ddc2d458572003e3c8c4cba8e8fc332f13ff9d4/src/sparseml/modifiers/pruning/wanda/pytorch.py#L34 SparseGPT still seems to be a better option for higher levels of sparsity though