salesforce / ALBEF

Code for ALBEF: a new vision-language pre-training method
BSD 3-Clause "New" or "Revised" License
1.57k stars 199 forks source link

Question about answer ranking #118

Open dhansmair opened 1 year ago

dhansmair commented 1 year ago

Hi there, I see that in line https://github.com/salesforce/ALBEF/blob/b9727e43c3040491774d1b22cc27718aa7772fac/models/model_vqa.py#L198 you are using a sum to accumulate the loss for the tokens in the answer sequence. How does this behave if the possible answers have varying lengths? Shouldn't the loss be divided by the sequence length to get the average loss per token? Otherwise, won't the ranking be biased towards shorter sequences?

LiJunnan1992 commented 1 year ago

Hi, sum of log_probs = log of the multiplication of probs = log of the sequence prob

ghost commented 1 year ago

I think @dhansmair makes a good point - indeed I also think it will be biased if we do not divide by the length of each answer sequence.