Closed shuguang99 closed 1 year ago
Hello!
We didn't do anything too sophisticated. Try using a larger GPU if possible!
Hello, I'm also curious about how to train with a batch size of 1024 on a single 2080Ti. Because when I use a batch size of 256 on a 32GB V100, it consumes 28GB of GPU memory.
It is mentioned in C.3 of the appendix to your paper "We use 50 steps of warmup and AdamW optimizer with a cosine-annealing learning rate schedule with N = 1024 batch size using a single NVIDIA RTX 2080 Ti GPU.", I know that in the actual configuration, it should be batch-size =256.
Although I used the same configuration and commands in your repository, I could only run at batch-size=32 (using 9GB of memory).
My GPU is RTX 3060 with 12GB memory,
This is my command:
`
CUDA_VISIBLE_DEVICES=0 python -m training.main \ --train-data=train_neg_clip.tsv \ --batch-size=256 \ --epochs=5 \ --name=negclip_256_1e-6 \ --lr=1e-6 \ --val-data=valid_neg_clip.tsv \ --logs="./logs/negCLIP/" \ --pretrained="openai" \ --model="ViT-B-32"\ --workers 8 \ --warmup 50 \ ---report-to wandb,tensorboard`
I wonder why this is, is it the GPU, or is there any memory-saving action you did (such as cropping or masking images)?