salesforce / ALBEF

Code for ALBEF: a new vision-language pre-training method
BSD 3-Clause "New" or "Revised" License
1.53k stars 195 forks source link

Result worse than it in the paper #71

Closed liuuzexiang closed 2 years ago

liuuzexiang commented 2 years ago

I used the 4M datasets to pretrain the albef_model, and finetune on the Image Retrieval task. For Flickr dataset, I get the TR R@1 82.14 but the result in your paper is 94.3. And other results are also worse than the results in your paper. I did not change any settings in the code. Do you know where the problems are ? Are there some tricks in the training?

LiJunnan1992 commented 2 years ago

Hi, do you use the same batch size as in the paper?

liuuzexiang commented 2 years ago

Hi, I used the same config in the Pretrain.yaml. And use 8 A100 with 40G memory. And I used the json file you provided. I check the data number, because of some url expired or other reason, there are 0.36M images missing in the CC3M dataset. Is the 0.36M data would decrease the performance so much ?

LiJunnan1992 commented 2 years ago

Can you provide your pre-training and fine-tuning log?

liuuzexiang commented 2 years ago

finetune.log pretrain.log there are the pretrain and finetune log.

LiJunnan1992 commented 2 years ago

Both your pretraining loss and finetuning loss are higher than what I have. You might want to finetune with the released checkpoint to see if you can get the reported results.

liuuzexiang commented 2 years ago

I finetuned with the released checkpoint and can get the reported results. Can you provide your pretrain log?

LiJunnan1992 commented 2 years ago

Here is the log:


{"train_lr": "0.000", "train_loss_mlm": "2.418", "train_loss_ita": "0.777", "train_loss_itm": "0.414", "epoch": 0}
{"train_lr": "0.000", "train_loss_mlm": "2.410", "train_loss_ita": "4.443", "train_loss_itm": "0.401", "epoch": 0}
{"train_lr": "0.000", "train_loss_mlm": "2.141", "train_loss_ita": "4.086", "train_loss_itm": "0.284", "epoch": 1}
{"train_lr": "0.000", "train_loss_mlm": "2.047", "train_loss_ita": "3.845", "train_loss_itm": "0.252", "epoch": 2}
{"train_lr": "0.000", "train_loss_mlm": "1.988", "train_loss_ita": "3.676", "train_loss_itm": "0.233", "epoch": 3}
{"train_lr": "0.000", "train_loss_mlm": "1.943", "train_loss_ita": "3.550", "train_loss_itm": "0.221", "epoch": 4}
{"train_lr": "0.000", "train_loss_mlm": "1.910", "train_loss_ita": "3.501", "train_loss_itm": "0.210", "epoch": 5}
{"train_lr": "0.000", "train_loss_mlm": "1.880", "train_loss_ita": "3.421", "train_loss_itm": "0.201", "epoch": 6}
{"train_lr": "0.000", "train_loss_mlm": "1.855", "train_loss_ita": "3.366", "train_loss_itm": "0.194", "epoch": 7}
{"train_lr": "0.000", "train_loss_mlm": "1.829", "train_loss_ita": "3.272", "train_loss_itm": "0.187", "epoch": 8}
{"train_lr": "0.000", "train_loss_mlm": "1.805", "train_loss_ita": "3.240", "train_loss_itm": "0.180", "epoch": 9}
{"train_lr": "0.000", "train_loss_mlm": "1.783", "train_loss_ita": "3.164", "train_loss_itm": "0.174", "epoch": 10}
{"train_lr": "0.000", "train_loss_mlm": "1.762", "train_loss_ita": "3.156", "train_loss_itm": "0.168", "epoch": 11}
{"train_lr": "0.000", "train_loss_mlm": "1.743", "train_loss_ita": "3.133", "train_loss_itm": "0.162", "epoch": 12}
{"train_lr": "0.000", "train_loss_mlm": "1.718", "train_loss_ita": "3.111", "train_loss_itm": "0.157", "epoch": 13}
{"train_lr": "0.000", "train_loss_mlm": "1.701", "train_loss_ita": "3.060", "train_loss_itm": "0.152", "epoch": 14}
{"train_lr": "0.000", "train_loss_mlm": "1.681", "train_loss_ita": "2.963", "train_loss_itm": "0.147", "epoch": 15}
{"train_lr": "0.000", "train_loss_mlm": "1.661", "train_loss_ita": "2.934", "train_loss_itm": "0.142", "epoch": 16}
{"train_lr": "0.000", "train_loss_mlm": "1.646", "train_loss_ita": "2.921", "train_loss_itm": "0.137", "epoch": 17}
{"train_lr": "0.000", "train_loss_mlm": "1.624", "train_loss_ita": "2.862", "train_loss_itm": "0.133", "epoch": 18}
{"train_lr": "0.000", "train_loss_mlm": "1.610", "train_loss_ita": "2.839", "train_loss_itm": "0.128", "epoch": 19}
{"train_lr": "0.000", "train_loss_mlm": "1.593", "train_loss_ita": "2.738", "train_loss_itm": "0.125", "epoch": 20}
{"train_lr": "0.000", "train_loss_mlm": "1.579", "train_loss_ita": "2.696", "train_loss_itm": "0.121", "epoch": 21}
{"train_lr": "0.000", "train_loss_mlm": "1.566", "train_loss_ita": "2.676", "train_loss_itm": "0.118", "epoch": 22}
{"train_lr": "0.000", "train_loss_mlm": "1.551", "train_loss_ita": "2.618", "train_loss_itm": "0.115", "epoch": 23}
{"train_lr": "0.000", "train_loss_mlm": "1.541", "train_loss_ita": "2.609", "train_loss_itm": "0.112", "epoch": 24}
{"train_lr": "0.000", "train_loss_mlm": "1.532", "train_loss_ita": "2.578", "train_loss_itm": "0.109", "epoch": 25}
{"train_lr": "0.000", "train_loss_mlm": "1.522", "train_loss_ita": "2.560", "train_loss_itm": "0.107", "epoch": 26}
{"train_lr": "0.000", "train_loss_mlm": "1.516", "train_loss_ita": "2.529", "train_loss_itm": "0.106", "epoch": 27}
{"train_lr": "0.000", "train_loss_mlm": "1.510", "train_loss_ita": "2.501", "train_loss_itm": "0.104", "epoch": 28}
{"train_lr": "0.000", "train_loss_mlm": "1.506", "train_loss_ita": "2.506", "train_loss_itm": "0.103", "epoch": 29}
liuuzexiang commented 2 years ago

Thanks. I found that the model didnot load pretrained vit model. Now my pretraining loss seems normal, but the loss_itm still larger than yours. Hope the final results is ok.

Averaged stats: lr: 0.0001 loss_mlm: 2.4164 loss_ita: 4.4564 loss_itm: 0.4566 epoch:0 Averaged stats: lr: 0.0001 loss_mlm: 2.1419 loss_ita: 4.0798 loss_itm: 0.3376 epoch:1 Averaged stats: lr: 0.0001 loss_mlm: 2.0455 loss_ita: 3.8254 loss_itm: 0.3027 epoch:2 Averaged stats: lr: 0.0001 loss_mlm: 1.9860 loss_ita: 3.6622 loss_itm: 0.2826 epoch:3 Averaged stats: lr: 0.0001 loss_mlm: 1.9417 loss_ita: 3.5292 loss_itm: 0.2677 epoch:4 Averaged stats: lr: 0.0001 loss_mlm: 1.9064 loss_ita: 3.4484 loss_itm: 0.2563 epoch:5 Averaged stats: lr: 0.0001 loss_mlm: 1.8782 loss_ita: 3.3979 loss_itm: 0.2461 epoch:6 Averaged stats: lr: 0.0001 loss_mlm: 1.8499 loss_ita: 3.3249 loss_itm: 0.2375 epoch:7 Averaged stats: lr: 0.0001 loss_mlm: 1.8278 loss_ita: 3.2866 loss_itm: 0.2293 epoch:8

Fly2flies commented 2 years ago

Hi, @DandelionYoungL how are your final results according to the pre-training logs?

liuuzexiang commented 2 years ago

The result is close to it in the paper. The error is within one point.