Closed hansd410 closed 2 years ago
Hi,
Not sure what you mean, bert-base-uncased
is the English pretrained BERT checkpoint. So in the code example above, we initialize the weights of the encoder and the decoder with the weights of BERT, and the weights of the cross-attention layers of the decoder are randomly initialized. One should fine-tune this warm-started model on an English downstream dataset, like summarization.
Thank you for reply @NielsRogge . Then I want to ask, why the decoder output is French in example code?
labels = tokenizer("Salut, mon chien est mignon", return_tensors="pt").input_ids
Oh I get your confusion.
The EncoderDecoderModel
framework is meant to be fine-tuned on text-to-text datasets, such as machine translation. Of course, if you want to fine-tune an EncoderDecoderModel
to perform translation from English to French, it makes sense to warm-start the decoder with a pre-trained French checkpoint, e.g.:
model = EncoderDecoderModel.from_encoder_decoder_pretrained('bert-base-uncased', 'camembert-base')
I'll update the code example. Also note that you then should use CamemBERT's tokenizer to create the labels.
@hansd410 do you mind opening a PR to fix this is in the docs?
@NielsRogge Not at all. Please do so.
Hi, I have a question.
From the example code of EncoderDecoderModel, pre-trained model 'bert-base-uncased' looks trained by EN-FR sentence pairs. Is pre-trained model for decoder is trained in French? How can I load pre-trained model in EN-EN pair? Am I confusing?