microsoft / MASS

MASS: Masked Sequence to Sequence Pre-training for Language Generation
https://arxiv.org/pdf/1905.02450.pdf
Other
1.11k stars 206 forks source link

ZhEn pretraining model #41

Open Bournet opened 4 years ago

Bournet commented 4 years ago

How to process our para data with the provided BPE codes. I ran the fastBPE tools, and got some problems. The privided BPE codes has two columns, fastBPE need three columns. Could you give some advice?

StillKeepTry commented 4 years ago

Sorry, can you use subword to generate bpe data?

pamin2222 commented 4 years ago

I am also facing the same problem. Do you have any suggestion on how to solve this issue to use the pre-train model?