mertyg / vision-language-models-are-bows

Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023
MIT License
222 stars 14 forks source link

Models are not in eval() mode. #7

Closed linzhiqiu closed 1 year ago

linzhiqiu commented 1 year ago

It seems that you did not set model.eval() for the models in your repo. Is there a specific reason?

vinid commented 1 year ago

Hello!

Flava and XVLM are loaded in eval mode.

I have checked but it can't find components that are affected by .eval in CLIP ViT-based (albeit, they recently introduced patch dropout in OpenCLIP).

Maybe it is better to add this anyway since using CLIP ResNet-based will give unstable results due to the batch norm.

Nope, this was partially wrong: see https://github.com/mertyg/vision-language-models-are-bows/issues/17