Block Sparse RNN - Githubissues

Abstract

Even though pruning methods reduce number of parameters by 90%, the speed-up is less than expected on different hardware platforms due to indexing overhead, irregular memory access and inability to utilize array data-path
Propose pruning weights in block format, train with group lasso regularization to encourage sparsity in the model
10x smaller parameters with ~10% loss of accuracy

Block Prune
- prune blocks of a matrix instead of individual weights, if maximum magnitude of a block is less than threshold
Pruning during Training
- this method actively prunes the parameters during training
- Hyperparameters for pruning are
Group Lasso Regularization
- L2 loss of group of weights
Experiments
- Speech Recognition system with CNN, RNN and FC layers
- < 5% loss of accuracy obtained when model parameter size is reduced by 1/3 ~ 1/4
- BP (block pruning), GLP (group lasso regularization with block pruning)
Speed-up
- Block Pruning has higher speed-up when batch is big
Pruning Schedule
- BP and GLP actively prunes than ordinary Weight Pruning
Performance over Prune ratio
- sudden decrease in performance after 90% threshold
- lower layers are pruned more than higher layers

wanted to see pruning in NMT
batch=1 has speed-up of ~3, wonder how they implemented it.
- if op is sparse, then do I have to code new inference nmt.py?
Parameter settings in experiments were quite odd..
- not sure what the real message is, number of hidden sizes and resulting number of parameters are just all over the place
I think they will be rejected..

Link : open-reveiw @ ICLR2018 Authors :