About the batch size of pre-training

wasiahmad / PLBART

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

https://arxiv.org/abs/2103.06333

MIT License

186 stars 35 forks source link

About the batch size of pre-training #19

Closed LeeSureman closed 3 years ago

LeeSureman commented 3 years ago

you said you use 2048 batch size in pretraining, but I see that in the pretrain/pretrain.sh (or absolute.sh in old), the real batch size is max-sentences update-freq num_of_gpus=32*60*8=15360?

wasiahmad commented 3 years ago

max-tokens is set to 2048 which accommodates 4-5 examples in each mini-batch in our GPUs (w/ 11GB memory). So, the effective batch size is (4-5)608 = ~2048. I believe the max-tokens flag supersedes the max-sentences flag.