openvinotoolkit / training_extensions

Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINO™
https://openvinotoolkit.github.io/training_extensions/
Apache License 2.0
1.14k stars 442 forks source link

question about how to use nncf #272

Closed Danee-wawawa closed 3 years ago

Danee-wawawa commented 4 years ago

Hi, thank you for your job. I have a question about the use of NNCF. When using NNCF as standalone, you mention the following step:

  1. After you create an instance of your model and load pretrained weights, create a compression algorithm and wrap your model by making the following call: compression_algo = create_compression_algorithm(model, config) model = compression_algo.model

What should I do if I do not have a pre-trained model and can’t load pretrained weights? Looking forward to your answer.

AlexanderDokuchaev commented 4 years ago

@AlexKoff88 @vshampor @ljaljushkin

vshampor commented 4 years ago

Greetings, dandanli814!

You can either supply a config JSON file without "compression" entries and then the model will perform in exactly the same way as the original (see examples/semantic_segmentation/configs/unet_camvid.json), or train the model without using the NNCF altogether. Either way will allow you to obtain a .pth model checkpoint, which you can then use as starting weights for compression fine-tuning (via --weights option in the examples, for instance).

Of course, you can train the model from scratch with compression algorithms (INT8, sparsity, ...) in place as well, but it seems unlikely that such approach will yield good results.

Danee-wawawa commented 4 years ago

Thank you for your advice. Are there any requirements for the .pth model checkpoint you mentioned to use as starting weights for compression fine-tuning? For example, the model obtained at the beginning of training or the model obtained at the end of training.

vshampor commented 4 years ago

@dandanli814, you usually want to obtain a compressed model with the best metric possible, so you should start with an FP32 model with the best possible metric as well, since compression usually degrades accuracy.

In the most deep learning neural network training cases the best metric on a given model is achieved at the end of the chosen training regimen.

Danee-wawawa commented 4 years ago

Thank you for your advice. I have another question, for compression fine-tuning, how much do I need to set my parameters such as learning rate, epoch or iterations, is it the same as before when without using the NNCF?

vshampor commented 4 years ago

@dandanli814, this is something that depends on the model, dataset and the compression algorithm and should be found out experimentally.

That said, NNCF is more of a finetuning tool, and the algorithms presented do not necessarily require that the bulk of the starting FP32 model weights be changed that much to obtain compression while keeping quality at an acceptable level. Therefore, it is advisable to set the learning rate during compression fine-tuning to a lower value than when training the model from scratch. See examples\*\configs to get a rough idea of what hyperparameters and LR schedules to set for each compression algorithm.

Danee-wawawa commented 4 years ago

Thank you very much. I will experiment according to your advice. ^_^

Danee-wawawa commented 4 years ago

@vshampor Hi, when I run your classification sample in examples/classification as following you advice:

python main.py -m train --config configs/quantization/mobilenetv2_imagenet_int8.json --data ./datasets/imagenet/ --log-dir=./results/quantization/mobilenetv2_int8/ --multiprocessing-distributed --world-size 2 --dist-url tcp://**** --rank 0

Then I got the final model named "mobilenetv2_imagenet_int8_best.pth". Then I validate this model checkpoint using the following:

python main.py -m test --config configs/quantization/mobilenetv2_imagenet_int8.json --resume ./mobilenetv2_imagenet_int8_best.pth --data ./datasets/imagenet/

The result is as following: Test: [0/196] Time: 50.829 (50.829) Loss: 4.5672 (4.5672) Acc@1: 7.422 (7.422) Acc@5: 44.922 (44.922) ................... Test: [180/196] Time: 1.159 (1.461) Loss: 5.6219 (5.8357) Acc@1: 3.125 (2.989) Acc@5: 8.594 (10.178) Test: [190/196] Time: 1.134 (1.447) Loss: 4.4965 (5.8044) Acc@1: 17.969 (3.260) Acc@5: 44.141 (10.925)

I can't get the accuracy you present "MobileNet V2 | INT8 | ImageNet | 71.33". And when it is training, the log doesn't seem right either as following:

........... 0:: Epoch: [3][2490/2503] Lr: 1e-05 Time: 4.124 (4.037) Data: 3.455 (3.467) CE_loss: 6.2188 (6.1365) CR_loss: 0.0000 (0.0000) Loss: 6.2188 (6.1365) Acc@1: 3.125 (1.815) Acc@5: 3.125 (6.769) 0:: Epoch: [3][2500/2503] Lr: 1e-05 Time: 3.992 (4.037) Data: 3.453 (3.467) CE_loss: 6.0290 (6.1365) CR_loss: 0.0000 (0.0000) Loss: 6.0290 (6.1365) Acc@1: 3.125 (1.816) Acc@5: 9.375 (6.775) 0:Test: [0/1563] Time: 3.587 (3.587) Loss: 6.5205 (6.5205) Acc@1: 0.000 (0.000) Acc@5: 0.000 (0.000) 0:Test: [10/1563] Time: 3.546 (3.573) Loss: 4.4578 (4.7999) Acc@1: 6.250 (5.682) Acc@5: 25.000 (36.080) ...........

I hope you can give me some advice. Thank you.

vshampor commented 4 years ago

Greetings, @dandanli814 !

Looks like we do not load the pretrained MobileNet from torchvision, even though the corresponding option is set in the INT8 config. You can override this by passing an exact FP32 MobileNetV2 starting checkpoint to the training script using the --weights command line.

Download the MobileNetV2 checkpoint from https://github.com/tonylins/pytorch-mobilenet-v2 and then add the --weights mobilenet_v2.pth.tar to your "train" command line. Note that even though the file has a .tar extension, it is not, in fact, a tape archive, but instead a raw PyTorch checkpoint.

Danee-wawawa commented 4 years ago

OK, thank you, I will try this.