om-ai-lab / VL-CheckList

Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations.
123 stars 4 forks source link

Difference between code and description in the paper #10

Closed ajtejankar closed 1 year ago

ajtejankar commented 1 year ago

Hi,

Thanks for open sourcing your code. I am trying to reproduce the results for ALBEF in your paper, but no success. I was going through your code and noticed that ITM logits/probabilities are used differently in the code than in the paper. Paper describes, "If the model score on the original text description is higher than the score on the generated negative samples, we regard it as positive output." However, in the code only the ITM logit corresponding to "matching" z[1] is used. Basically, the code never compares the scores between positive and negative text as described in the paper. Can you please clarify?

Thanks, Ajinkya

kyusonglee commented 1 year ago

We set the self.task == 'itc' when evaluating the score in the paper.