Large batch size question

In your paper, it says that you train the CAVP model with a batch size of 720 across 8 A100 GPUS, in other words 90 video samples per GPU. I am trying to reproduce your CAVP training pipeline and am also training on A100, but am struggling to scale the batch size beyond even 12 (per GPU) due to GPU memory limits (CUDA OOM errors). Would you be able to share your data loading code, or provide any tips for increasing batch size within an A100's memory limits in this sort of framework? Thank you very much!

luosiallen / Diff-Foley

Large batch size question #7