huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.79k stars 26.96k forks source link

Can I train an Rnd2GPT model through HuggingFace "Encoder-Decoder Model"? #13861

Closed soda-lsq closed 3 years ago

soda-lsq commented 3 years ago

Hi,

I would like to train an Rnd2GPT model, whose encoder is a randomly initialized transformer encoder and the decoder utilizes the pre-trained GPT2 model. I found that HuggingFace's "Encoder-Decoder Model" could implement the architecture of "Bert2Bert" and "Bert2GPT" models. However, my source input is not a sentence that could be represented directly through the Bert model, it may be better to be encoded through an initialized transformer encoder.

So, I would like to know how can I achieve the Rnd2GPT model through the Hugging Face "Encoder-Decoder Model"?

Very grateful for your help! Thanks!

NielsRogge commented 3 years ago

You can initialize an EncoderDecoderModel with any autoencoding text encoder and any autoregressive text decoder. These can be randomly initialized, or you can start from pre-trained checkpoints.

So yes, it's totally possible to instantiate an EncoderDecoderModel with a randomly initialized BERT and a pre-trained GPT-2 model, like so:

from transformers import EncoderDecoderModel, BertConfig, BertModel, GPT2LMHeadModel

# randomly initialized BERT
encoder_config = BertConfig()
encoder = BertModel(encoder_config)

# pre-trained GPT-2
decoder = GPT2LMHeadModel.from_pretrained("gpt2")

model = EncoderDecoderModel(encoder=encoder, decoder=decoder) 
soda-lsq commented 3 years ago

You can initialize an EncoderDecoderModel with any autoencoding text encoder and any autoregressive text decoder. These can be randomly initialized, or you can start from pre-trained checkpoints.

So yes, it's totally possible to instantiate an EncoderDecoderModel with a randomly initialized BERT and a pre-trained GPT-2 model, like so:

from transformers import EncoderDecoderModel, BertConfig, BertModel, GPT2LMHeadModel

# randomly initialized BERT
encoder_config = BertConfig()
encoder = BertModel(encoder_config)

# pre-trained GPT-2
decoder = GPT2LMHeadModel.from_pretrained("gpt2")

model = EncoderDecoderModel(encoder=encoder, decoder=decoder) 

I see! Thank you so much for your kind reply! It means a lot to me!