salesforce / ALBEF

Code for ALBEF: a new vision-language pre-training method
BSD 3-Clause "New" or "Revised" License
1.45k stars 193 forks source link

About data augmentation for pretraining #98

Open vateye opened 1 year ago

vateye commented 1 year ago

Hi, I have noticed that RandomResizedCrop is adopted for pretraining with the scaling parameter (0.2, 1). I am a bit curious about the choice of this scaling parameter since if too small the image-text pair would be noisy, so how do you select this parameter? Besides, ColorJitter and GaussianBlur are widely used in image self-supervised learning, have you tried these two extra augmentations?

Thanks.

LiJunnan1992 commented 1 year ago

Hi, we haven't tried ColorJitter and GaussianBlur. We adopt the scaling parameter (0.2, 1) because it has been widely-adopted.

vateye commented 1 year ago

Hi, we haven't tried ColorJitter and GaussianBlur. We adopt the scaling parameter (0.2, 1) because it has been widely-adopted.

Thanks for your reply. I am wondering whether linear lr scaling rule is still applied for pretraining? Do you have any experience about this?