Even though pruning methods reduce number of parameters by 90%, the speed-up is less than expected on different hardware platforms due to indexing overhead, irregular memory access and inability to utilize array data-path
Propose pruning weights in block format, train with group lasso regularization to encourage sparsity in the model
10x smaller parameters with ~10% loss of accuracy
Details
Block Prune
prune blocks of a matrix instead of individual weights, if maximum magnitude of a block is less than threshold
Pruning during Training
this method actively prunes the parameters during training
Hyperparameters for pruning are
Group Lasso Regularization
L2 loss of group of weights
Experiments
Speech Recognition system with CNN, RNN and FC layers
< 5% loss of accuracy obtained when model parameter size is reduced by 1/3 ~ 1/4
BP (block pruning), GLP (group lasso regularization with block pruning)
Speed-up
Block Pruning has higher speed-up when batch is big
Pruning Schedule
BP and GLP actively prunes than ordinary Weight Pruning
Performance over Prune ratio
sudden decrease in performance after 90% threshold
lower layers are pruned more than higher layers
Personal Thoughts
wanted to see pruning in NMT
batch=1 has speed-up of ~3, wonder how they implemented it.
if op is sparse, then do I have to code new inference nmt.py?
Parameter settings in experiments were quite odd..
not sure what the real message is, number of hidden sizes and resulting number of parameters are just all over the place
Abstract
Details
Block Prune
Pruning during Training
Group Lasso Regularization
Experiments
Speed-up
Pruning Schedule
Performance over Prune ratio
Personal Thoughts
Link : open-reveiw @ ICLR2018 Authors :