Closed Danee-wawawa closed 3 years ago
@AlexKoff88 @vshampor @ljaljushkin
Greetings, dandanli814!
You can either supply a config JSON file without "compression" entries and then the model will perform in exactly the same way as the original (see examples/semantic_segmentation/configs/unet_camvid.json), or train the model without using the NNCF altogether. Either way will allow you to obtain a .pth model checkpoint, which you can then use as starting weights for compression fine-tuning (via --weights option in the examples, for instance).
Of course, you can train the model from scratch with compression algorithms (INT8, sparsity, ...) in place as well, but it seems unlikely that such approach will yield good results.
Thank you for your advice. Are there any requirements for the .pth model checkpoint you mentioned to use as starting weights for compression fine-tuning? For example, the model obtained at the beginning of training or the model obtained at the end of training.
@dandanli814, you usually want to obtain a compressed model with the best metric possible, so you should start with an FP32 model with the best possible metric as well, since compression usually degrades accuracy.
In the most deep learning neural network training cases the best metric on a given model is achieved at the end of the chosen training regimen.
Thank you for your advice. I have another question, for compression fine-tuning, how much do I need to set my parameters such as learning rate, epoch or iterations, is it the same as before when without using the NNCF?
@dandanli814, this is something that depends on the model, dataset and the compression algorithm and should be found out experimentally.
That said, NNCF is more of a finetuning tool, and the algorithms presented do not necessarily require that the bulk of the starting FP32 model weights be changed that much to obtain compression while keeping quality at an acceptable level. Therefore, it is advisable to set the learning rate during compression fine-tuning to a lower value than when training the model from scratch. See examples\*\configs
to get a rough idea of what hyperparameters and LR schedules to set for each compression algorithm.
Thank you very much. I will experiment according to your advice. ^_^
@vshampor Hi, when I run your classification sample in examples/classification as following you advice:
python main.py -m train --config configs/quantization/mobilenetv2_imagenet_int8.json --data ./datasets/imagenet/ --log-dir=./results/quantization/mobilenetv2_int8/ --multiprocessing-distributed --world-size 2 --dist-url tcp://**** --rank 0
Then I got the final model named "mobilenetv2_imagenet_int8_best.pth". Then I validate this model checkpoint using the following:
python main.py -m test --config configs/quantization/mobilenetv2_imagenet_int8.json --resume ./mobilenetv2_imagenet_int8_best.pth --data ./datasets/imagenet/
The result is as following: Test: [0/196] Time: 50.829 (50.829) Loss: 4.5672 (4.5672) Acc@1: 7.422 (7.422) Acc@5: 44.922 (44.922) ................... Test: [180/196] Time: 1.159 (1.461) Loss: 5.6219 (5.8357) Acc@1: 3.125 (2.989) Acc@5: 8.594 (10.178) Test: [190/196] Time: 1.134 (1.447) Loss: 4.4965 (5.8044) Acc@1: 17.969 (3.260) Acc@5: 44.141 (10.925)
I can't get the accuracy you present "MobileNet V2 | INT8 | ImageNet | 71.33". And when it is training, the log doesn't seem right either as following:
........... 0:: Epoch: [3][2490/2503] Lr: 1e-05 Time: 4.124 (4.037) Data: 3.455 (3.467) CE_loss: 6.2188 (6.1365) CR_loss: 0.0000 (0.0000) Loss: 6.2188 (6.1365) Acc@1: 3.125 (1.815) Acc@5: 3.125 (6.769) 0:: Epoch: [3][2500/2503] Lr: 1e-05 Time: 3.992 (4.037) Data: 3.453 (3.467) CE_loss: 6.0290 (6.1365) CR_loss: 0.0000 (0.0000) Loss: 6.0290 (6.1365) Acc@1: 3.125 (1.816) Acc@5: 9.375 (6.775) 0:Test: [0/1563] Time: 3.587 (3.587) Loss: 6.5205 (6.5205) Acc@1: 0.000 (0.000) Acc@5: 0.000 (0.000) 0:Test: [10/1563] Time: 3.546 (3.573) Loss: 4.4578 (4.7999) Acc@1: 6.250 (5.682) Acc@5: 25.000 (36.080) ...........
I hope you can give me some advice. Thank you.
Greetings, @dandanli814 !
Looks like we do not load the pretrained MobileNet from torchvision, even though the corresponding option is set in the INT8 config. You can override this by passing an exact FP32 MobileNetV2 starting checkpoint to the training script using the --weights
command line.
Download the MobileNetV2 checkpoint from https://github.com/tonylins/pytorch-mobilenet-v2 and then add the --weights mobilenet_v2.pth.tar
to your "train" command line. Note that even though the file has a .tar
extension, it is not, in fact, a tape archive, but instead a raw PyTorch checkpoint.
OK, thank you, I will try this.
Hi, thank you for your job. I have a question about the use of NNCF. When using NNCF as standalone, you mention the following step:
What should I do if I do not have a pre-trained model and can’t load pretrained weights? Looking forward to your answer.