numenta / nupic.research

Experimental algorithms. Unsupported.
https://nupicresearch.readthedocs.io
GNU Affero General Public License v3.0
107 stars 60 forks source link

RES-2198: Add sparse deepspeed transformer layer configuration and mixin #550

Closed lscheinkman closed 3 years ago

lscheinkman commented 3 years ago

Sparse version of https://www.deepspeed.ai/tutorials/transformer_kernel/ Got 10% speed up over the original HF implementation

lscheinkman commented 3 years ago

It is definitely not compatible with RigL because it is a deepspeed specific layer. I will update the PR to make sure it stays compatible with GMP as long as the sparsity is kept global for the whole bert layer.