huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.4k stars 26.37k forks source link

Can not instantiate BertGenerationEncoder or BertGenerationDecoder from bert model #11184

Closed ken-arf closed 3 years ago

ken-arf commented 3 years ago

Environment info

Who can help

@LysandreJik @sgugger, @patil-suraj

Information

Model I am using BertGeneration:

The problem arises when using:

To reproduce

Steps to reproduce the behavior:

  1. encoder = BertGenerationEncoder.from_pretrained("bert-large-uncased", bos_token_id=101, eos_token_id=102) https://huggingface.co/transformers/model_doc/bertgeneration.html?highlight=bertgeneration

  2. I have got following error File "python3.6/site-packages/transformers/modeling_utils.py", line 988, in from_pretrained **kwargs, File "python3.6/site-packages/transformers/configuration_utils.py", line 405, in from_pretrained ), f"You tried to initiate a model of type '{cls.model_type}' with a pretrained model of type '{config_dict['model_type']}'" AssertionError: You tried to initiate a model of type 'bert-generation' with a pretrained model of type 'bert'

Expected behavior

The same script works when using the previous version 4.4.2

StevenTang1998 commented 3 years ago

I met the same issues.

LysandreJik commented 3 years ago

This was fixed by #11207 and was released in patch 4.5.1. Please install the latest version and let us know if it works for you!

ken-arf commented 3 years ago

It works in patch 4.5.1, thanks. I still wonder what is the difference between the following two methods to instantiate the BERT encoder-decoder model. Probably I should ask in another thread.

model_name='bert-base-multilingual-cased' encoder = BertGenerationEncoder.from_pretrained(model_name, bos_token_id=bos_token_id, eos_token_id=eos_token_id) decoder = BertGenerationDecoder.from_pretrained(model_name, add_cross_attention=True, is_decoder=True, bos_token_id=bos_token_id, eos_token_id=eos_token_id) bert2bert = EncoderDecoderModel(encoder=encoder, decoder=decoder)

v.s.

model_name='bert-base-multilingual-cased' bert2bert = EncoderDecoderModel.from_encoder_decoder_pretrained(model_name, model_name) bert2bert.config.decoder.decoder_start_token_id = bos_token_id bert2bert.config.encoder.bos_token_id = bos_token_id bert2bert.config.encoder.eos_token_id = eos_token_id bert2bert.config.encoder.pad_token_id = pad_token_id

patil-suraj commented 3 years ago

Hi @ken-arf

Both methods are doing the same thing.

The difference is that in the second method you don't need to initialize the encoder and decoder, you could just pass the name of the model two the from_encoder_decoder_pretrained method and takes care of initializing the encoder, decoder and adding cross_attention in the decoder etc.

ken-arf commented 3 years ago

Hi Suraj

Thank you for your answer, I understand. I used both methods interchangeably, so that sounds good to me.