huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.76k stars 27.18k forks source link

BartTokenizer prepare_seq2seq_batch() does not return decoder_input_ids, decoder_attention_mask as per document after passing tgt_texts #7846

Closed MojammelHossain closed 4 years ago

MojammelHossain commented 4 years ago

I am trying to train a seq2seq model using BartModel. As per BartTokenizer documentation if I pass tgt_texts then it should return decoder_attention_mask and decoder_input_ids please check the attachment for clarity. image But I am only getting input_ids, attention_mask and labels. image

freespirit commented 4 years ago

I am facing the same issue and I noticed that the method indeed returns the ["input_ids"] of tgt_texts as labels. I think I could easily fix this to return both input_ids and attention_mask of tgt_texts (as decoder_...) but I noticed the same pattern in other seq2seq models, like T5. I am not sure what's the proper solution but if it is similar to what I suggest, than I'd be happy to make a pull request.

@LysandreJik I'd be happy to hear an opinion and start working on this.

freespirit commented 4 years ago

I think https://github.com/huggingface/transformers/pull/6654/ and https://github.com/huggingface/transformers/issues/6624 are related - the PR changed decoder_input_ids to labels. Probably the documentation should be changed but I have to get more familiar with the respective issue and PR to be sure.

MojammelHossain commented 4 years ago

Thanks for the feedback @freespirit. Hopefully, they will update the documentation as it is a little bit confusing. But what I found that the modeling_bart.py file already handles the problem. _prepare_bart_decoder_inputs() and shift_tokens_right() solving that if I am not wrong. But I think I have to go deeper for understanding which I am trying to. image image

LysandreJik commented 4 years ago

Pinging @sshleifer for advice

sshleifer commented 4 years ago

@MojammelHossain is correct, the docs are wrong. The correct usage is to allow _prepare_bart_decoder_inputs to make decoder_input_ids and decoder_attention_mask for you. For training, you only need to pass the 3 keys returned by prepare_seq2seq_batch.