salesforce / ALBEF

Code for ALBEF: a new vision-language pre-training method
BSD 3-Clause "New" or "Revised" License
1.57k stars 199 forks source link

About dropout and no_grad. #124

Open boyaom opened 1 year ago

boyaom commented 1 year ago

Hi, I see that your bert model config has a non_zero dropout value. And the momentum operation just use torch.no_grad. We know that torch.no_grad does not prevent the dropout function from running, unless with eval mode. Does this will affect the effect?