microsoft / MASS

MASS: Masked Sequence to Sequence Pre-training for Language Generation
https://arxiv.org/pdf/1905.02450.pdf
Other
1.12k stars 206 forks source link

how to preprocess data and use the finetuned model ? #147

Closed 15091444119 closed 4 years ago

15091444119 commented 4 years ago

I tried to use the en-ro ft model, but get-data-nmt.sh applies bpe using src_vocab and tgt_vocab, which are not provided. How can I preprocess data and reproduce en-ro results using the ft model ?

StillKeepTry commented 4 years ago

You just need to provide --reload_vocab and --reload_codes, and it will auto-generate src_vocab and tgt_vocab. reload_vocab is at: https://dl.fbaipublicfiles.com/XLM/vocab_enro and reload_codes is at: https://dl.fbaipublicfiles.com/XLM/codes_enro