Open sayakpaul opened 4 years ago
The pruning API already allows you to configure block structure in the pruning parameters. TFLite also relies on that to achieve inference speed-up on ARM cpus.
If there's something else you're asking for in this feature request, can you list them out more explicitly? For example the target hardware, the specific block structure, etc.? Thanks!
An example would be great.
So, does the current support allow us to speed up the sparse training on supported hardware like the A100 since they have better support for it?
System information
Motivation
Sparsity is a well-studied topic in neural networks that is highly relevant from both research and engineering fronts. Model optimization techniques like pruning are fueled by the idea of sparsity. Moreover, it plays even better when combined with techniques like quantization, and knowledge distillation.
tfmot
provides us with good support for pruning by allowing it from many different flavors like training models with pruning schedules from scratch instead of re-training them, ability to customize pruning mechanics within a model at a layer-level, etc.OpenAI folks propose block sparse kernels as an improvement/alternative to rigorous sparse kernels and they show that it can greatly improve the computation time. Here's the original blog post. Here's the original paper.
As examples:
Dense
layers use block sparse layers in the first place. Here's some motivation from Hugging Face.