Closed vateye closed 2 years ago
0.15 might be good for text only pre-training as in BERT, but since in cross-model pre-training, image/video is used as additional context, it might be better to increase this ratio. 0.5 is the best according to our pilot study.
Hi, I found that you use 0.5 probability when using MLM? Why not using 0.15 during MLM? I am so curious about the setting.