Confused about the "max-sentences" in pretraining

wasiahmad / PLBART

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

https://arxiv.org/abs/2103.06333

MIT License

186 stars 35 forks source link

Confused about the "max-sentences" in pretraining #48

Closed freexxxyyy closed 1 year ago

freexxxyyy commented 1 year ago

Hi,

In the pretraining script, you set the max-sentences to 32. Max-sentences is per GPU, so PER_GPU_TRAIN_BATCH_SIZE is 32. But the "max-tokens" is 2048 and the "tokens-per-sample" is 512, , so PER_GPU_TRAIN_BATCH_SIZE is 4. Why are these two parameters conflicted?

Thanks

wasiahmad commented 1 year ago

We used an older version of fairseq where max_tokens have higher priorities. So you can ignore max-sentences.

freexxxyyy commented 1 year ago

Does it mean that the latest fairseq will always first use "max-sentences" as batch size, and ignore "max-tokens"?

wasiahmad commented 1 year ago

Yes.