wasiahmad / PLBART

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].
https://arxiv.org/abs/2103.06333
MIT License
186 stars 35 forks source link

about lang token position in the target #20

Closed LeeSureman closed 3 years ago

LeeSureman commented 3 years ago

I find that in multilingual denoising task of fairseq , the target's language token is at the end, which is not like the table3 in your paper, where the language token is at the start of target. am I wrong? image

LeeSureman commented 3 years ago

I almost figure it out. fairseq will set it as the bos of input in collator.

wasiahmad commented 3 years ago

You are right, language token is added at the end of the target sequence. But then fairseq right shift the target sequence by 1 token which brings the language token at the beginning. So, the paper provides the correct information.