a small model such as Mobilenet v2 for pre-training

keyu-tian / SparK

[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"

https://arxiv.org/abs/2301.03580

MIT License

1.42k stars 82 forks source link

a small model such as Mobilenet v2 for pre-training #25

Open mmmz28 opened 1 year ago

mmmz28 commented 1 year ago

Thank you for your excellent work. Replacing the transformer with CNN does make deployment more friendly. Furthermore, I'm wondering if using a smaller model such as Mobilenet v2 for pre-training and then fine-tuning downstream would be effective?

keyu-tian commented 1 year ago

Thank you and we agree that this could be of general interest and value. We will consider running SparK on mobilenet recently (perhaps v2 and v3), or you can try it out too. (see tutorial at https://github.com/keyu-tian/SparK/tree/main/pretrain#tutorial-for-pretraining-your-own-cnn-model).

xylcbd commented 1 year ago

@keyu-tian Can I use swinv2-base as the backbone for pre-training?

keyu-tian commented 1 year ago

@xylcbd sorry but SparK is not suitable for this. Our SparK can pretrain any CNN model but swinv2 is a transformer. Maybe you can use MAE or SimMIM to pretrain swin transformer.