zomux / lanmt

LaNMT: Latent-variable Non-autoregressive Neural Machine Translation with Deterministic Inference
MIT License
79 stars 4 forks source link

Using existing NLP pre-trained encoders like BERT, RoBERTa #2

Open BogdanDidenko opened 4 years ago

BogdanDidenko commented 4 years ago

What do you think about combining your architecture with existing pre-trained encoders? Can BERT as an prior_encoder help achieve the better results?

zomux commented 4 years ago

@BogdanDidenko Improving the prior is a promising approach. Here is a figure shows that the BLEU score goes up monotonically when improving the quality of the prior. (It shows the interpolation between p(z|x) and q(z|x,y) )

interpolation

I'm not sure whether BERT is able to do the job, but it is a promising thing to investigate. If it works in autoregressive models, it shall also work in non-autoregressive models somehow.

BogdanDidenko commented 4 years ago

Yes, it's interesting research area. In my experience with BERT and autoregressive transformer decoder I achieve ~10% quality improvement in my seq2seq task(with RoBERTa this result even better). But I use some tricks and hard to say how it's will work with proposed approach.