mertyg / vision-language-models-are-bows

Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023
MIT License
222 stars 14 forks source link

How you run expreriments with batch size 1024 on a Single RTX-2080Ti #31

Closed wujianP closed 11 months ago

wujianP commented 11 months ago

Hello, I'm curious about how to train with a batch size of 1024 on a single 2080Ti. Because when I use a batch size of 256 on a 32GB V100, it consumes 28GB of GPU memory. Do I miss any details?

vinid commented 11 months ago

Hello!

This could be a typo that comes from when we switched to a100s in the camera-ready version of the paper. You should be able to reproduce most of the results even with a smaller batch size, what you might lose is a bit of generalization power. If possible, use a larger GPU.