salesforce / ALBEF

Code for ALBEF: a new vision-language pre-training method
BSD 3-Clause "New" or "Revised" License
1.46k stars 193 forks source link

Training behavior on 16 gpus #92

Open zhaohengz opened 2 years ago

zhaohengz commented 2 years ago

Hi,

I was trying to accelerate the training but running on 16 gpus but having some trouble in reproducing similar numbers. In #39 , you mentioned that set lr to 2e-4 helps. May I ask if you changed the min_lr and warmup_lr as well? Would you mind sharing the training log of the 16gpu run with 4M images, if available?

Thanks a lot!

LiJunnan1992 commented 2 years ago

HI, min_lr and warmup_lr can remain the same. Can you reproduce the paper's number with 8 A100 gpus?

zhaohengz commented 2 years ago

Hi,

Thanks for your kind reply! I haven't finished the run with 8 GPUs yet, but all loss terms are similar with those in #71 . So I assume it will reproduce the results in the paper. It is just too slow so I am trying to accelerate by using more gpus.

The issue I am having with 16 gpus is that with 2x batch size, each epoch only has half iterations so the momentum based ITC learner seems not working as well as the 8-gpu version. I am seeing a relatively higher ITA loss. I am not sure if that's the normal behavior or I did something wrong.

BTW, did you reduce the warmup_epochs, (# of steps actually) for the 16-gpu version? The model will see more samples during the warmup process if it remain unchanged.