Open YsYusaito opened 3 years ago
Hi @YsYusaito,
How to use pruning to reduce the (compressed) size of a model ?
Pruning does not reduce a model size in itself. Pruning finds the least significant weights of the models (those that are close to zero) and forces them to zero. When saving a model, the weights occupy the same space whether their values are zero or not. This is why your model1.h5
(baseline model) and model2.h5
(model after pruning) files have the same size.
Compression algorithms, such as Gzip, are efficient on data that contains zeroes. This is why the compressed size of the pruned model is smaller than the compressed size of the baseline model. Compressing a model is useful to reduce the size of a mobile app containing the model, or reducing bandwidth when sending it over network. The model is then decompressed before loading. In this use case, pruning can drastically improve the compression ratio.
Why is model_for_pruning
larger than the baseline, even though it is pruned ?
In the previous example there are three versions of the model:
baseline_model
model_for_pruning = prune_low_magnitude(baseline_model, **pruning_params)
model_for_export = strip_pruning(model_for_pruning)
model_for_pruning
has the weights of baseline_model
with additional variables used during pruning. That is why this model is larger than even the baseline.
Once pruning is done, those variables need to be removed using strip_pruning()
. In the end, model_for_export
is the model that you want to use for inference.
Hi, @fredrec . Thank you for your reply. Thanks to your polite explanation, I was able to understand.
Do you have a plan to reduce the model size itself by deleting 0 weights ??
Currently at the Tensorflow model level, the weights are still stored as a dense tensor.
Some backend, like Tensorflow Lite, use a sparse representation after conversion. Aside from the size benefits, some sparse op implementation also allow for shorter inference time. The Pruning for on-device inference w/ XNNPACK tutorial shows an example of that.
Prior to filing: check that this should be a bug instead of a feature request. Everything supported, including the compatible versions of TensorFlow, is listed in the overview page of each technique. For example, the overview page of quantization-aware training is here. An issue for anything not supported should be a feature request.
Describe the bug 「prune_low_magnitude」 cannot reduce the size of tflite or model. I run the pruning with keras example. https://www.tensorflow.org/model_optimization/guide/pruning/pruning_with_keras I want to obtain smaller size of .h5 file by using pruning. Actually, I could reduce model size via gzip. However, the output of gzip compression is 「.zip」 file, so I can't do inference with this .zip file.
How can I get compressed .h5 model?? (Are there any other compression method for create the compressed .h5 model??)
System information
TensorFlow version (installed from source or binary):2.5.0
TensorFlow Model Optimization version (installed from source or binary):0.5.0
Python version: 3.7.10
Describe the expected behavior The size of pruned .h5 model is smaller the base .h5 model.
Describe the current behavior model1.h5 : base model model1.h5 : pruned model Model size is same between base and pruned model.
Code to reproduce the issue
Additional context If I use (★) code instead of (※) code, the pruned model size increased. (I thought pruned model size reduced ,because strip_pruning restore the original model. https://www.tensorflow.org/model_optimization/api_docs/python/tfmot/sparsity/keras/strip_pruning?hl=ja) I would be more appreciate if you also tell the reason about this.