Closed lscheinkman closed 3 years ago
It is definitely not compatible with RigL because it is a deepspeed specific layer. I will update the PR to make sure it stays compatible with GMP as long as the sparsity is kept global for the whole bert layer.
Sparse version of https://www.deepspeed.ai/tutorials/transformer_kernel/ Got 10% speed up over the original HF implementation