microsoft / Oscar

Oscar and VinVL
MIT License
1.04k stars 252 forks source link

Question Related to VIVO Pre-training #176

Open enesmsahin opened 2 years ago

enesmsahin commented 2 years ago

Hi,

In VinVL paper, you mention following:

By adding VIVO [9] pre-training, our VinVL improves the original VIVO result by 6 CIDEr points and creates a new SoTA.

As far as I know, VIVO pretrains a transformer model on Object Detection dataset with a masked object prediction task. Then, this model is further trained on Image Captioning dataset.

In your experiments, as I see, you applied the same approach as VIVO for the results in Table 9 in the paper. I could not find codes related to this experiment. Could you point out or share code parts for those experiments?

Thanks.