mbart encoder decoder model

md975 commented 3 years ago

Hi,

I've been following this to implement a bert2bert seq2seq model which works pretty well. Now I would like to change this to mbart (facebook/mbart-large-50) instead of bert.

I'm very new to this, but my assumption was that the same script should probably work for other transformers. So I didn't change much, just initialized the tokenizer and also the model's encoder and decoder with mbart, however, I get the following error when passing the data to the bart2bart model during training:

File "/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/python3.7/site-packages/transformers/models/encoder_decoder/modeling_encoder_decoder.py", line 442, in forward encoder_hidden_states=encoder_outputs.hidden_states, AttributeError: 'Seq2SeqModelOutput' object has no attribute 'hidden_states'

I'm probably making an obvious mistake but I'm not sure if I understand what the problem is and how I can fix it.

Thanks

beelzmon commented 3 years ago

Hey,

Could you leave more details:

Your environment
Your code(or edited so i can try it)

Looking here: https://huggingface.co/transformers/model_doc/mbart.html?highlight=config#transformers.MBartConfig

It doesnt seem to be using hidden_states. Depending how you use the model, you may be grabbing its output incorrectly.

md975 commented 3 years ago

Thanks. I'm using python 3.7, torch 1.7.1 and installed transformers from the source (4.6.0.dev0). I'm following the exact implementations from here, with minor edits:

model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50")
tokenizer = MBart50TokenizerFast.from_pretrained("facebook/mbart-large-50", src_lang="cs_CZ", tgt_lang="cs_CZ")

changed the function _process_data_to_modelinputs

def process_data_to_model_inputs(batch):
    # tokenize the inputs and labels
    inputs = tokenizer(batch['src'], padding=True, truncation=True, return_tensors="pt")
    with tokenizer.as_target_tokenizer():
        outputs = tokenizer(batch['tgt'], return_tensors="pt", padding=True, truncation=True)
    labels = outputs.input_ids.clone()
    data = TensorDataset(torch.tensor(inputs['input_ids']), torch.tensor(inputs['attention_mask']),
                         torch.tensor(outputs['input_ids']), torch.tensor(outputs['attention_mask']),
                         torch.tensor(labels))

    dataloader = DataLoader(data, batch_size=batch_size)
    return dataloader

and then training:

bart2bart = EncoderDecoderModel.from_encoder_decoder_pretrained("facebook/mbart-large-50", "facebook/mbart-large-50")

for i in range(EPOCH):
    bart2bart.train()
    for step, batch in enumerate(train_data):
        batch = tuple(t.to(device) for t in batch)
        b_input_ids, b_attention_masks_enc, b_input_ids_de, b_attention_masks_de, b_labels= batch
        outputs = bart2bart(input_ids=b_input_ids, attention_mask=b_attention_masks_enc,
                            labels=b_labels, decoder_input_ids=b_input_ids_de, decoder_attention_mask=b_attention_masks_de)
        loss, logits = outputs.loss, outputs.logits
        optimizer.zero_grad()
        bart2bart.zero_grad()
        loss.backward()
        optimizer.step()

I'm very new to this, so I'm probably not using the model correctly as you mentioned. But I'm not sure how to fix it.

beelzmon commented 3 years ago

Hey,

Unfortunately i dont use torch, just tensorflow functional API. However i did note that for EncoderDecoder there can be a special configuration procedure. See here: https://huggingface.co/transformers/model_doc/encoderdecoder.html#transformers.EncoderDecoderConfig

It is possible that the default config doesn't behave well with MBart as it does with Bert(they are significantly different).

Try passing in the configs for your encoder and decoder (both MBart) or load config from pretrained, there is example code in the above link. It certainly an error in what the decoder expects.

md975 commented 3 years ago

I tried this, thanks! The issue still remains though...it's not working. @patrickvonplaten any tips for using mbart for an Encoder-Decoder Model based on your example notebook for bert?

beelzmon commented 3 years ago

Hey,

Fair enough, one last thing id note is: "The EncoderDecoderModel can be used to initialize a sequence-to-sequence model with any pretrained autoencoding model as the encoder and any pretrained autoregressive model as the decoder." from https://huggingface.co/transformers/model_doc/encoderdecoder.html

I am not sure BART can be used for this.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / transformers

mbart encoder decoder model #11495