Closed miguelgfierro closed 6 years ago
Using CNTK with simpsons dataset: freeze: Test accuracy is 0.763938618926 Process time 621.0402624607086 seconds
finetune: Test accuracy is 0.959335038363 Process time 1635.1398813724518 seconds
Using Simpsons dataset (20k images)
ResNet18 on 1GPU finetune: Training complete in 21m 37s Best val Acc: 0.953708
ResNet18 on 1GPU freeze: Training complete in 8m 30s Best val Acc: 0.692327
ResNet18 on 4GPU finetune: Training complete in 10m 11s Best val Acc: 0.954220
ResNet18 on 4GPU freeze: Training complete in 9m 42s Best val Acc: 0.686957
ResNet152 on 4GPU finetune: Training complete in 64m 55s Best val Acc: 0.987468
ResNet152 on 4GPU freeze: Training complete in 59m 18s Best val Acc: 0.727110
There is a big overload when doing Dataparallel.
In words of one of the developers:
At the moment, DataParallel broadcasts parameters that are not modified, and has some other overhead around replicate and broadcast_coalesce. We are in the process to improve it. See some progress here: https://github.com/pytorch/pytorch/pull/4216.
Posted in pytorch forum: https://discuss.pytorch.org/t/understanding-time-difference-between-finetuning-and-training-the-last-layer-with-frozen-weights/10796