Closed kalyangvs closed 4 years ago
@Michaelvll Does the quant plus pruning model include other data such as last_optimizer_state, optimizer_history etc..
Thank you for asking! We are still cleaning the code for compression. We quantized the model parameters to 8 bits and sensitive prune the model with NervanaSystems/distiller. We only calculated the model size since the optimizer states are not used in inference.
@Michaelvll can you please provide distiller config, which was used?
Also do you prune individual weights, or whole channels/filters/heads?
For simplicity, we use sensitivity pruning for our model, which is fine-grained pruning, i.e. pruning the individual weights. You can try on the configuration for the WMT En-Fr model with 527M #Multi-Adds.
Could you share some more information on how you quantize the model? Did you use NervanaSystems/distiller for quantization?
Hi, Can you please provide the code used to compress the model by 18.2 X using pruning and quantization. Thanks.