wasiahmad / PLBART

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].
https://arxiv.org/abs/2103.06333
MIT License
186 stars 35 forks source link

the parameter setting when increasing "tokens_per_sample" #47

Closed freexxxyyy closed 1 year ago

freexxxyyy commented 1 year ago

the default setting for tokens_per_sample in PLBART is 512, and when I increase it to 1024, there is an "index of bounds" bug reported by nvidia. but when I set max-source-positions and max-target-positions to 2048(default is 1024), there is no such error. though there is no error, I am not sure if the setting is correct and I am curious to know the meaning of these two parameters. Also, there is another max-positions, which should be set to the same as tokens-per-sample, and this seems to related to the positional embeddings. What are the differences for these three positional parameters?

also, the max-sentences is the batch size for all gpus? if max_tokens is 2048 and tokens_per_sample is 1024, and it is trained on 8 gpus, so the max-sentences should be 8*(2048/1024)=16?

Thanks

wasiahmad commented 1 year ago
freexxxyyy commented 1 year ago

I am not sure but one possible reason is fairseq includes a special token to the sequences which will make 1024 length sequence to > 1024 causing an out-of-index issue for the positional representations.

I consider this. So during preprocessing, the length limit is 1020. Do you know the meaning of max-source-positions and max-target-positions?

Please check Fairseq documentation to learn about max-sentences, max_tokens, and tokens_per_sample.

Thanks. I will look at documentations.