Will pretrained models converge faster on downstream tasks than non pretrained models?

keyu-tian / SparK

[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"

https://arxiv.org/abs/2301.03580

MIT License

1.42k stars 82 forks source link

Will pretrained models converge faster on downstream tasks than non pretrained models? #24

Closed xylcbd closed 1 year ago

xylcbd commented 1 year ago

Is there any loss curve for that? thanks

keyu-tian commented 1 year ago

Yes we provide some tensorboard log like in https://github.com/keyu-tian/SparK/tree/main/downstream_d2#fine-tuned-resnet-50-weights-log-files-and-performance.

I couldn't find the experiment log of non-pretrained finetuning, but I found a MoCo V2 log. Here's an AP curve of ResNet-50-SparK (blue) vs. ResNet-50-MoCoV2 (orange). It's a COCO MaskRCNN FPN 1x finetuning configuration.

loss

I think MoCo V2 should converge faster than non-pretrained, so the advantage of SparK is evident.