The validation score of the LXMERT baseline

BierOne commented 3 years ago

Hi, there!

Thank you for much for sharing this outstanding work.

Following your codes, I first re-implement the LXMERT baseline these days. But my validation score of this model is only 62% (no-vqa pertaining and using all vqa-v2 training samples), which is relatively low compared with the standard repository. Besides, the training also takes more epochs (15ep) than the default.

Would you please share your validation score and training logs here? I want to figure out why this phenomenon. Thanks!

siddk commented 3 years ago

Hey @BierOne - thanks for reaching out!

There’s one key difference between the LXMERT checkpoint in this repository vs. in Hao Tan’s original — specifically the original LXMERT for VQA model implemented in the paper pretrains on both VQA-2 and GQA.

For the active learning work here, we didn’t think this was a fair comparison, so we asked Hao for a checkpoint that did not pretrain on these VQA datasets (but other datasets for image captioning for example). This is why the numbers are lower; hope this makes sense!

BierOne commented 3 years ago

Thanks for the prompt reply!

I totally understand this difference. However, I also obtained the similar results (62% validation score) when I used the model pre-trained on the VQA datasets. This is really strange for me.

Have you ever tried the model pre-trained on the VQA datasets? If so, could you please tell me its validation score? I am not sure if this problem is due to some bugs in my codes or the huggingface implementation. Thank you so much!

siddk commented 3 years ago

Ah! My apologies, I understand now. I don’t recall ever trying the checkpoint stored in HuggingFace pretrained on everything — your best bet is to open an issue in either Transformers directly, or Hao’s repo.

If you decide to open an issue on transformers, let me know, and I can see who’s able to take a look!

BierOne commented 3 years ago

Thank you! Once I figure it out, I will let you know then:)

BierOne commented 3 years ago

Hey, @siddk ! I found the problem!

In fact, there are two important factors in the re-implementation of LXMERT:

The load of the pre-trained weights. In the transformer (huggingface) version, If we directly use the following code would miss almost all pre-trained weights since the keys (module names) are not matched:
```
from transformers import LxmertConfig, LxmertTokenizerFast, LxmertForQuestionAnswering
from transformers import LxmertForQuestionAnswering
```

LxmertConfig.from_pretrained("unc-nlp/lxmert-base-uncased", cache_dir='data/LXMERT') lxmert_config = LxmertConfig.from_pretrained("unc-nlp/lxmert-base-uncased", cache_dir=config.lxmert_cache) lxrt = LxmertForQuestionAnswering.from_pretrained( None, config=lxmert_config, state_dict=torch.load(os.path.join("data/snap/pretrained", "Epoch19_LXRT.pth")) )



2. **The initialization of the answer-heads.** For more information about this, you could refer to [here](), which is written by Tao Hao.

When I addressed the above two problems, the validation score is drastically improved from **62% to 70%**. 

Hope this helps!

siddk / vqa-outliers

The validation score of the LXMERT baseline #3