Transfer learning from natural image datasets, particularly ImageNet, using standard large models and corresponding pretrained weights has become a de-facto method for deep learning applications to medical imaging. However, there are fundamental dierences in data sizes, features and task specications between natural image classication and the target medical tasks, and there is little understanding of the eects of transfer. In this paper, we explore properties of transfer learning for medical imaging. A performance evaluation on two large scale medical imaging tasks shows that surprisingly, transfer oers little benet to performance, and simple, lightweight models can perform comparably to ImageNet architectures. Investigating the learned representations and features, we nd that some of the dierences from transfer learning are due to the over-parametrization of standard models rather than sophisticated feature reuse. We isolate where useful feature reuse occurs, and outline the implications for more ecient model exploration. We also explore feature independent benets of transfer arising from weight scalings.
Paper: https://arxiv.org/abs/1902.07208
Transfer learning from natural image datasets, particularly ImageNet, using standard large models and corresponding pretrained weights has become a de-facto method for deep learning applications to medical imaging. However, there are fundamental dierences in data sizes, features and task specications between natural image classication and the target medical tasks, and there is little understanding of the eects of transfer. In this paper, we explore properties of transfer learning for medical imaging. A performance evaluation on two large scale medical imaging tasks shows that surprisingly, transfer oers little benet to performance, and simple, lightweight models can perform comparably to ImageNet architectures. Investigating the learned representations and features, we nd that some of the dierences from transfer learning are due to the over-parametrization of standard models rather than sophisticated feature reuse. We isolate where useful feature reuse occurs, and outline the implications for more ecient model exploration. We also explore feature independent benets of transfer arising from weight scalings.