microsoft / MASS

MASS: Masked Sequence to Sequence Pre-training for Language Generation
https://arxiv.org/pdf/1905.02450.pdf
Other
1.12k stars 206 forks source link

Will BERT+transformer-decoder better than tensor2tensor for text-generation? #155

Closed guotong1988 closed 4 years ago

guotong1988 commented 4 years ago

Thank you very much.

StillKeepTry commented 4 years ago

I guess your meanings of tensor2tensor refer to standard models without pre-trained models. The advances of pre-trained models are usually on low-resource tasks. So i think it is decided by tasks.