salesforce / ALBEF

Code for ALBEF: a new vision-language pre-training method
BSD 3-Clause "New" or "Revised" License
1.53k stars 195 forks source link

question about ITM loss #53

Closed Qiulin-W closed 2 years ago

Qiulin-W commented 2 years ago

Hi,

Thanks for the great work. After reading the code for calculating ITM loss, I have a question below: ec5abdbd-e78c-41e5-9ccf-334ec1d4a0cf

The itm labels for positive and negative samples are in a "fixed" order instead of being shuffled. I'm wondering whether the order be an issue for the ITM loss to work correctly? In some other VLP models such as ViLT, the ITM loss is calculated based on an shuffled pos-neg batches, which is detailed at https://github.com/dandelin/ViLT/blob/762fd3975c180db6fc88f577cf39549983fa373a/vilt/modules/objectives.py#L207

Thanks in advance for your kind reply.

LiJunnan1992 commented 2 years ago

Thanks for your question!

First, the negative samples are sampled from the mini-batch. Second, the order of the itm_labels doesn't affect the result, because the loss takes an average across the batch.

Qiulin-W commented 2 years ago

Thanks for your question!

First, the negative samples are sampled from the mini-batch. Second, the order of the itm_labels doesn't affect the result, because the loss takes an average across the batch.

Thanks so much for your reply! And what is the magnitude of ITM loss after pretraining?

LiJunnan1992 commented 2 years ago

It is around 0.11-0.13

Qiulin-W commented 2 years ago

It is around 0.11-0.13

Thanks so much!

4fee8fea commented 2 years ago

Hi, @LiJunnan1992 Thanks for your work and make it public!

I'm trying to collect the negative samples from the whole batch, i.e., not limited to the same mini-batch.

Could you please tell me whether such a sampling strategy will incur disaster to ITM loss under the current label order?

Thanks in advance!

Thanks for your question!

First, the negative samples are sampled from the mini-batch. Second, the order of the itm_labels doesn't affect the result, because the loss takes an average across the batch.

LiJunnan1992 commented 2 years ago

Hi, @LiJunnan1992 Thanks for your work and make it public!

I'm trying to collect the negative samples from the whole batch, i.e., not limited to the same mini-batch.

Could you please tell me whether such a sampling strategy will incur disaster to ITM loss under the current label order?

Thanks in advance!

Thanks for your question! First, the negative samples are sampled from the mini-batch. Second, the order of the itm_labels doesn't affect the result, because the loss takes an average across the batch.

Yes you can do it. Please check out our BLIP's code on hard negative mining from all GPUs: https://github.com/salesforce/BLIP/blob/main/models/blip_retrieval.py

4fee8fea commented 2 years ago

Hi, @LiJunnan1992

Thanks for your reply! We will follow the BLIP work again.

We want to follow the promising ALBEF and BLIP, however the dataset becomes a obstacle.

The SBU Captions is inaccessible, could you please offer one copy to us? Thanks! We will do our best to move forward and appreciate your enthusiastic help!