uclanlp / visualbert

Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"
528 stars 104 forks source link

seq_relationship_score logits order #26

Closed michelecafagna26 closed 3 years ago

michelecafagna26 commented 3 years ago

I'm testing this model on the image-sentence-alignment task and I'm observing weird results.

By running the pretrained COCO model in eval-mode on COCO17 I get results below random chance ( using basically the setting used for the pretraining).

The 'seq_relationship_score' returns two logits and according to what reported in the doc:

Following the doc, as I said, I get results that would make much more sense if the meaning of the logits was flipped.

Moreover, that part of the code seems to have been borrowed from the transformers library, and recently a similar issue has been found in another BERT-based model: https://github.com/huggingface/transformers/issues/9212

We are conducting experiments with your model and it would be convenient for us just to ignore the documentation and to report the results flipped.

It would be great if you could clarify this point!

Thank you in advance!

liunian-harold-li commented 3 years ago

Yes the order is flipped as in the original BERT implementation.