salesforce / ALBEF

Code for ALBEF: a new vision-language pre-training method
BSD 3-Clause "New" or "Revised" License
1.53k stars 195 forks source link

Pre-training Image-Text Matching #82

Closed xiezhiweihk closed 2 years ago

xiezhiweihk commented 2 years ago

Hello! Thank you in advance, I have a question to ask. image There are two classification cases in the paper, matching and non-matching. image I can't debug the code because I don't have enough GPU. If I have one image, one image has five captions. My task is to predict the similarity between one image and five different subtitles, can I change it to a multi-classification problem. If I predict their similarity score, can I usel1loss be used here?

LiJunnan1992 commented 2 years ago

You can change the loss according to your specific tasks