Open dhansmair opened 1 year ago
Hi, sum of log_probs = log of the multiplication of probs = log of the sequence prob
I think @dhansmair makes a good point - indeed I also think it will be biased if we do not divide by the length of each answer sequence.
Hi there, I see that in line https://github.com/salesforce/ALBEF/blob/b9727e43c3040491774d1b22cc27718aa7772fac/models/model_vqa.py#L198 you are using a sum to accumulate the loss for the tokens in the answer sequence. How does this behave if the possible answers have varying lengths? Shouldn't the loss be divided by the sequence length to get the average loss per token? Otherwise, won't the ranking be biased towards shorter sequences?