Cross-modal Retrieval Objective (CMR)

intersun / LightningDOT

source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT

https://arxiv.org/abs/2103.08784

MIT License

73 stars 9 forks source link

Cross-modal Retrieval Objective (CMR) #9

Open mojivalipour opened 3 years ago

mojivalipour commented 3 years ago

Can you point me to the place in your code where CMR is implemented? You used CMR + VMLM + SMRM for the pre-training, according to the paper. However, CMR is not part of your supported tasks. Am I missing something?

intersun commented 3 years ago

it is used by default so you don't have to specify it. The loss is implemented at https://github.com/intersun/LightningDOT/blob/5f2880f69ba87b8701ab89348d70ebb11432578c/dvl/utils.py#L114.

mojivalipour commented 3 years ago

Now I'm quite confused with how this repository's files are structured. I thought pretrain.py is the file to pre-train your lightningDOT model. However, it does not appear that the _calc_loss function has been called anywhere in the code. However, train_itm.py uses this function several times. Therefore, can you please provide me with specific instructions on how to reproduce lightningDOT paper results?

mojivalipour commented 3 years ago

Is that the case that pretrain.py is only provided to pre-train the UNITER model? If not then what's the usage of train_itm.py?

intersun commented 3 years ago

I totally agree it is indeed confusing since we didn't have time to clean the code. As you may noticed we left lots of other stuff that are not mentioned in the paper such as hard negatives, knowledge distillation and etc.

To answer your question (for pre-training only, I assume you already figured out how to use the loss for fine-tuning), if you trace the definition of pre-trained model (https://github.com/intersun/LightningDOT/blob/5f2880f69ba87b8701ab89348d70ebb11432578c/pretrain.py#L313), you will notice the relevant forward function is defined at https://github.com/intersun/LightningDOT/blob/5f2880f69ba87b8701ab89348d70ebb11432578c/dvl/models/bi_encoder.py#L484.

To answer your second question, pretrain.py is solely for pre-training and train_itm is solely for fine-tuning. Currently I don't have time to merge them, and that is definitely confusing....

mojivalipour commented 3 years ago

I see, thank you. So, essentially your itm implementation is different from the original implementation of itm in UNITER. And yours is based on the CMR in the paper. In fact, itm_loss1 is the same image-retrieval loss, and itm_loss2 is the same text retrieval-loss in the article. Just one more question, what is that ot_loss?

intersun commented 3 years ago

correct... since I implemented fine-tuning first, and later found out changing it into pre-training is not trivial, I ended up implementing pre-training and fine-tuning separately...

OT loss refers to loss proposed in http://proceedings.mlr.press/v119/chen20e.html. I never tried it though so not sure how it will work.