Can not instantiate BertGenerationEncoder or BertGenerationDecoder from bert model

ken-arf commented 3 years ago

Environment info

transformers version: 4.5.0
Platform: Ubuntu 18.04.5 LTS
Python version: 3.6.9
PyTorch version (GPU?): 1.8.1 (Quadro GV100 )
Tensorflow version (GPU?):
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help

@LysandreJik @sgugger, @patil-suraj

Information

Model I am using BertGeneration:

The problem arises when using:

[ ] the official example scripts: (give details below)

To reproduce

Steps to reproduce the behavior:

encoder = BertGenerationEncoder.from_pretrained("bert-large-uncased", bos_token_id=101, eos_token_id=102) https://huggingface.co/transformers/model_doc/bertgeneration.html?highlight=bertgeneration
I have got following error File "python3.6/site-packages/transformers/modeling_utils.py", line 988, in from_pretrained **kwargs, File "python3.6/site-packages/transformers/configuration_utils.py", line 405, in from_pretrained ), f"You tried to initiate a model of type '{cls.model_type}' with a pretrained model of type '{config_dict['model_type']}'" AssertionError: You tried to initiate a model of type 'bert-generation' with a pretrained model of type 'bert'

Expected behavior

The same script works when using the previous version 4.4.2

StevenTang1998 commented 3 years ago

I met the same issues.

LysandreJik commented 3 years ago

This was fixed by #11207 and was released in patch 4.5.1. Please install the latest version and let us know if it works for you!

ken-arf commented 3 years ago

It works in patch 4.5.1, thanks. I still wonder what is the difference between the following two methods to instantiate the BERT encoder-decoder model. Probably I should ask in another thread.

model_name='bert-base-multilingual-cased' encoder = BertGenerationEncoder.from_pretrained(model_name, bos_token_id=bos_token_id, eos_token_id=eos_token_id) decoder = BertGenerationDecoder.from_pretrained(model_name, add_cross_attention=True, is_decoder=True, bos_token_id=bos_token_id, eos_token_id=eos_token_id) bert2bert = EncoderDecoderModel(encoder=encoder, decoder=decoder)

v.s.

model_name='bert-base-multilingual-cased' bert2bert = EncoderDecoderModel.from_encoder_decoder_pretrained(model_name, model_name) bert2bert.config.decoder.decoder_start_token_id = bos_token_id bert2bert.config.encoder.bos_token_id = bos_token_id bert2bert.config.encoder.eos_token_id = eos_token_id bert2bert.config.encoder.pad_token_id = pad_token_id

patil-suraj commented 3 years ago

Hi @ken-arf

Both methods are doing the same thing.

The difference is that in the second method you don't need to initialize the encoder and decoder, you could just pass the name of the model two the from_encoder_decoder_pretrained method and takes care of initializing the encoder, decoder and adding cross_attention in the decoder etc.

ken-arf commented 3 years ago

Hi Suraj

Thank you for your answer, I understand. I used both methods interchangeably, so that sounds good to me.

huggingface / transformers