keyu-tian / SparK

[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"
https://arxiv.org/abs/2301.03580
MIT License
1.41k stars 82 forks source link

Target dataset and augmentation #64

Closed LiYuhangUSTC closed 8 months ago

LiYuhangUSTC commented 9 months ago

Hi, thanks for your work. What a great job!

I am trying to use SparK to get a pretrained model for my target task. I have a few questions.

  1. I notice a very simple augmentation (flip, crop, and norm) is conducted on ImageNet in the default SparK pretraining setting. Will a more sophisticated augmentation, e.g. color augmentation, help SparK or not?
  2. If pretraining with an unlabeled dataset of the target domain, should I use a simple augmentation or a more sophisticated one similar to the one in the target task training?
keyu-tian commented 9 months ago

Hi yuhang, for 1. I personally tried to add ColorJitter but found it would hurt performance. Actually MAE, SimMIM and more related masked image modeling work also only use such minimal augmentations (flip & randresizedcrop). I think the learning of SparK (also MAE or SimMIM) is to model the pixel distribution itself (like learning some pixel correlation or dependency). So any distortion or inappropriate transformation to the distribution (like ColorJitter) can mislead the learning and thus be harmful.

For 2, I would only consider adding any extra augmentation if: i. the data is extremely insufficient ii. the augmentation is quite natural and reasonable for the target image distribution: it'll always produce in-distribution augmented images, rather than adding noise to the data distribution.

LiYuhangUSTC commented 9 months ago

Thank you for the quick and valuable reply! I agree that maintaining the data distribution is important in the pretraining stage. Should be more careful when selecting augmentations.