Open ankitknitj opened 1 year ago
@J-shang
here is an example for save / load pruning model, note that level pruner is a finegrained pruner, it is not recommend to use ModelSpeedup to speedup finegrained masks, maybe you could have a try with any other pruner:
model = ...
config_list = ...
pruner = XXXPruner(model, config_list)
_, masks = pruner.compress()
pruner._unwrap_model()
ModelSpeedup(model, dummy_input, masks).speedup_model()
# this model is a real smaller model
torch.save(model, 'path_to_save_model')
model = torch.load('path_to_save_model')
Note that the model will do simulated quantization in the quantizer, it will lose this ability when you export model from the quantizer, the exported model and calibration_config are used for quantization speedup, https://nni.readthedocs.io/en/stable/tutorials/quantization_speedup.html
@Bonytu , could you give an example for save / load simulated quantization model?
Hi, as of now , i am pruning a model, and then quantizing the pruned model. What i want is to save the final model (after bith these steps), and then load it to do inference later. Is there no way to save and load the model in this scenario, that gives accurate results? @J-shang @Bonytu
Describe the issue: I am using level pruning and dorefa quantization. I tried to save the model, using export_model/torch save dict (both seperately) and then load the state_dict, but getting inconsistent results. Although, i get accurate results whiile saving and loading Level pruning + LSQ quantized model. How to save and load such models for accurate results? I also tried export_model of Dorefa quantizer and saved calibration_config. Still not getting accurate results