Calling model.eval() when computing scores otherwise non-deterministic results (torch._no_grad_() is not enough)

mertyg / vision-language-models-are-bows

Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023

MIT License

261 stars 15 forks source link

It seems you do not call model.eval() before computing the scores (at least in the notebook example). Indeed, torch._no_grad_() does not disable modules like Dropout This raises problems, as scores are non deterministic:

with XVLM, this gives different scores (I think because LayerNorm is activated. So the norm statistics are updated in the inference loop, which we don't want at test time if I understand correctly.)
with BLIP, dropout is activated is the text encoder, and the output of text_encoder is not deterministic.

It might also raise problems for other models but I haven't checked. Shouldn't we add a call to model.eval()?

mertyg / vision-language-models-are-bows

Calling model.eval() when computing scores otherwise non-deterministic results (torch._no_grad_() is not enough) #17