Closed md975 closed 3 years ago
Hey,
Could you leave more details:
Looking here: https://huggingface.co/transformers/model_doc/mbart.html?highlight=config#transformers.MBartConfig
It doesnt seem to be using hidden_states. Depending how you use the model, you may be grabbing its output incorrectly.
Thanks. I'm using python 3.7, torch 1.7.1 and installed transformers from the source (4.6.0.dev0). I'm following the exact implementations from here, with minor edits:
model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50")
tokenizer = MBart50TokenizerFast.from_pretrained("facebook/mbart-large-50", src_lang="cs_CZ", tgt_lang="cs_CZ")
changed the function _process_data_to_modelinputs
def process_data_to_model_inputs(batch):
# tokenize the inputs and labels
inputs = tokenizer(batch['src'], padding=True, truncation=True, return_tensors="pt")
with tokenizer.as_target_tokenizer():
outputs = tokenizer(batch['tgt'], return_tensors="pt", padding=True, truncation=True)
labels = outputs.input_ids.clone()
data = TensorDataset(torch.tensor(inputs['input_ids']), torch.tensor(inputs['attention_mask']),
torch.tensor(outputs['input_ids']), torch.tensor(outputs['attention_mask']),
torch.tensor(labels))
dataloader = DataLoader(data, batch_size=batch_size)
return dataloader
and then training:
bart2bart = EncoderDecoderModel.from_encoder_decoder_pretrained("facebook/mbart-large-50", "facebook/mbart-large-50")
for i in range(EPOCH):
bart2bart.train()
for step, batch in enumerate(train_data):
batch = tuple(t.to(device) for t in batch)
b_input_ids, b_attention_masks_enc, b_input_ids_de, b_attention_masks_de, b_labels= batch
outputs = bart2bart(input_ids=b_input_ids, attention_mask=b_attention_masks_enc,
labels=b_labels, decoder_input_ids=b_input_ids_de, decoder_attention_mask=b_attention_masks_de)
loss, logits = outputs.loss, outputs.logits
optimizer.zero_grad()
bart2bart.zero_grad()
loss.backward()
optimizer.step()
I'm very new to this, so I'm probably not using the model correctly as you mentioned. But I'm not sure how to fix it.
Hey,
Unfortunately i dont use torch, just tensorflow functional API. However i did note that for EncoderDecoder there can be a special configuration procedure. See here: https://huggingface.co/transformers/model_doc/encoderdecoder.html#transformers.EncoderDecoderConfig
It is possible that the default config doesn't behave well with MBart as it does with Bert(they are significantly different).
Try passing in the configs for your encoder and decoder (both MBart) or load config from pretrained, there is example code in the above link. It certainly an error in what the decoder expects.
I tried this, thanks! The issue still remains though...it's not working. @patrickvonplaten any tips for using mbart for an Encoder-Decoder Model based on your example notebook for bert?
Hey,
Fair enough, one last thing id note is: "The EncoderDecoderModel can be used to initialize a sequence-to-sequence model with any pretrained autoencoding model as the encoder and any pretrained autoregressive model as the decoder." from https://huggingface.co/transformers/model_doc/encoderdecoder.html
I am not sure BART can be used for this.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hi,
I've been following this to implement a bert2bert seq2seq model which works pretty well. Now I would like to change this to mbart (facebook/mbart-large-50) instead of bert.
I'm very new to this, but my assumption was that the same script should probably work for other transformers. So I didn't change much, just initialized the tokenizer and also the model's encoder and decoder with mbart, however, I get the following error when passing the data to the bart2bart model during training:
I'm probably making an obvious mistake but I'm not sure if I understand what the problem is and how I can fix it.
Thanks