tensorflow / model-optimization

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
https://www.tensorflow.org/model_optimization
Apache License 2.0
1.49k stars 319 forks source link

Support for block sparse matrices #542

Open sayakpaul opened 4 years ago

sayakpaul commented 4 years ago

System information

Motivation

Sparsity is a well-studied topic in neural networks that is highly relevant from both research and engineering fronts. Model optimization techniques like pruning are fueled by the idea of sparsity. Moreover, it plays even better when combined with techniques like quantization, and knowledge distillation.

tfmot provides us with good support for pruning by allowing it from many different flavors like training models with pruning schedules from scratch instead of re-training them, ability to customize pruning mechanics within a model at a layer-level, etc.

OpenAI folks propose block sparse kernels as an improvement/alternative to rigorous sparse kernels and they show that it can greatly improve the computation time. Here's the original blog post. Here's the original paper.

As examples:

  1. Instead of using Dense layers use block sparse layers in the first place. Here's some motivation from Hugging Face.
liyunlu0618 commented 3 years ago

The pruning API already allows you to configure block structure in the pruning parameters. TFLite also relies on that to achieve inference speed-up on ARM cpus.

If there's something else you're asking for in this feature request, can you list them out more explicitly? For example the target hardware, the specific block structure, etc.? Thanks!

sayakpaul commented 3 years ago

An example would be great.

So, does the current support allow us to speed up the sparse training on supported hardware like the A100 since they have better support for it?