salesforce / ALBEF

Code for ALBEF: a new vision-language pre-training method
BSD 3-Clause "New" or "Revised" License
1.45k stars 193 forks source link

Grounding script vs retrival #103

Open Ngheissari opened 1 year ago

Ngheissari commented 1 year ago

The grounding script has an identical train function to the retrieval and it uses the same retrieval model apart from the fact that it is trained on refcoco rather than mscoco .

The loss functions are also the same. I understand that it is weakly supervised grounding but what are the differences (apart from dataset and evaluation code). The only difference is that in the loss functions, the idx in retrieval will be image idx and in grounding object idx . So the code is applicable to both ? is that correct ?

LiJunnan1992 commented 1 year ago

Yes the weakly-supervised grounding task is trained in the same way as retrieval.