Understanding time difference in finetunning vs freeze and train

miguelgfierro / ai_projects

AI projects

https://miguelgfierro.com/

Other

797 stars 180 forks source link

Understanding time difference in finetunning vs freeze and train #50

Closed miguelgfierro closed 6 years ago

miguelgfierro commented 6 years ago

Posted in pytorch forum: https://discuss.pytorch.org/t/understanding-time-difference-between-finetuning-and-training-the-last-layer-with-frozen-weights/10796

miguelgfierro commented 6 years ago

More info about finetuning: https://discuss.pytorch.org/t/how-to-perform-finetuning-in-pytorch/419

https://github.com/mortezamg63/Accessing-and-modifying-different-layers-of-a-pretrained-model-in-pytorch

miguelgfierro commented 6 years ago

Using CNTK with simpsons dataset: freeze: Test accuracy is 0.763938618926 Process time 621.0402624607086 seconds

finetune: Test accuracy is 0.959335038363 Process time 1635.1398813724518 seconds

miguelgfierro commented 6 years ago

Using Simpsons dataset (20k images)

ResNet18 on 1GPU finetune: Training complete in 21m 37s Best val Acc: 0.953708

ResNet18 on 1GPU freeze: Training complete in 8m 30s Best val Acc: 0.692327

ResNet18 on 4GPU finetune: Training complete in 10m 11s Best val Acc: 0.954220

ResNet18 on 4GPU freeze: Training complete in 9m 42s Best val Acc: 0.686957

ResNet152 on 4GPU finetune: Training complete in 64m 55s Best val Acc: 0.987468

ResNet152 on 4GPU freeze: Training complete in 59m 18s Best val Acc: 0.727110

miguelgfierro commented 6 years ago

There is a big overload when doing Dataparallel.

In words of one of the developers:

At the moment, DataParallel broadcasts parameters that are not modified, and has some other overhead around replicate and broadcast_coalesce. We are in the process to improve it. See some progress here: https://github.com/pytorch/pytorch/pull/4216.