Closed seokg closed 4 years ago
I have read the Appendix 6.1 on the paper, and the authors have provided the complete pipeline of the training.
For the second stage, I am guessing the author has transferred the weight of the teacher to the student network using load_networks
.
Finally, the authors provided Table 5. for training details such as training epochs (for training from scratch, distillation and fine-tuning) and once-for-all network training. For pix2pix and cycle gan they have doubled the epoch for once-for-all network compared to the training/distillation/fine-tuning.
I guess this answers all the questions I had. Please correct me if I got something wrong.
Yes, you're correct. The once-for-all network training will take no more than 2 days on a single 1080Ti.
Hi, after going through the codes I have come up with few questions regarding the training and distillation.
resnet_supernet.py
?resnet_distiller.py
and transfer the weight to the student supernet?load_networks
function inresnet_distiller.py
, is it necessary to transfer the weight of the teacher network to the student network? or is it just for faster training and convergence?