microsoft / MASS

MASS: Masked Sequence to Sequence Pre-training for Language Generation
https://arxiv.org/pdf/1905.02450.pdf
Other
1.12k stars 206 forks source link

How does MASS supervised machine translation perform preprocessing? #176

Open IdaBetsy opened 3 years ago

IdaBetsy commented 3 years ago

Hello, I want to use the MASS model for supervised machine translation tasks (EN-DE), so how do I prepare the data before binarization? For example, what is monolingual data? How to make a bpe? How to make a tokenizer? You only provide a directory on the EN-ZH translation. Can you provide a script for processing? JQN48W0VQ~3PJ3MZGM%}S83

Looking forward to your reply, thank you very much! @StillKeepTry