mertyg / vision-language-models-are-bows

Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023
MIT License
261 stars 15 forks source link

I cannot run on RTX 3060 with batch-size=256! #30

Closed shuguang99 closed 1 year ago

shuguang99 commented 1 year ago

It is mentioned in C.3 of the appendix to your paper "We use 50 steps of warmup and AdamW optimizer with a cosine-annealing learning rate schedule with N = 1024 batch size using a single NVIDIA RTX 2080 Ti GPU.", I know that in the actual configuration, it should be batch-size =256.

Although I used the same configuration and commands in your repository, I could only run at batch-size=32 (using 9GB of memory).

My GPU is RTX 3060 with 12GB memory,

This is my command: ` CUDA_VISIBLE_DEVICES=0 python -m training.main \ --train-data=train_neg_clip.tsv \ --batch-size=256 \ --epochs=5 \ --name=negclip_256_1e-6 \ --lr=1e-6 \ --val-data=valid_neg_clip.tsv \ --logs="./logs/negCLIP/" \ --pretrained="openai" \ --model="ViT-B-32"\ --workers 8 \ --warmup 50 \ ---report-to wandb,tensorboard `

I wonder why this is, is it the GPU, or is there any memory-saving action you did (such as cropping or masking images)?

vinid commented 1 year ago

Hello!

We didn't do anything too sophisticated. Try using a larger GPU if possible!

wujianP commented 1 year ago

Hello, I'm also curious about how to train with a batch size of 1024 on a single 2080Ti. Because when I use a batch size of 256 on a 32GB V100, it consumes 28GB of GPU memory.