Closed anicolson closed 3 years ago
Hi @anicolson ,
We would love to help, but sadly when you post such a long script it will be very hard and time-consuming for us to take a look at. We're happy to assist if you could provide a short, precise, and complete code snippet that is based on Transformers Seq2SeqTrainer only. Here's our guide on how to request support.
Also from what I can see, seems like you are initializing bert encoder and bert decoder separately, you could directly instantiate it using the EncoderDecoder
model class to get a seq2seq model. Here are two colab notebooks that show how to train EncoderDecoder
models using Seq2SeqTrainer
. The notebooks show how to fine-tune for summarization task, but could be easily adapted for translation as well.
Leverage BERT for Encoder-Decoder Summarization on CNN/Dailymail
Leverage RoBERTa for Encoder-Decoder Summarization on BBC XSum
Thanks for your reply,
I am attempting to create a shorter version that is not so time-consuming.
Certainly, the EncoderDecoder
is an attractive option if one is using natural language, but I would like to highlight that using BertGenerateDecoder
allows the user to provide any sequence for cross-attention, even those derived from encoders that operate on modalities other than natural language, which I think is powerful.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Thanks for your reply,
I am attempting to create a shorter version that is not so time-consuming.
Certainly, the
EncoderDecoder
is an attractive option if one is using natural language, but I would like to highlight that usingBertGenerateDecoder
allows the user to provide any sequence for cross-attention, even those derived from encoders that operate on modalities other than natural language, which I think is powerful.
Hi, have you tackled the problem? I encounter the exactly same problem. Any cues?
Environment info
transformers
version: 4.2.1Who can help
TextGeneration: @TevenLeScao Text Generation: @patrickvonplaten examples/seq2seq: @patil-suraj
Information
I am using BertGenerationEncoder and BertGenerationDecoder. I am using
transformers
in combination with PyTorch lightning.At inference,
.generate()
outputs the same thing for each input.I am unsure of why this is occurring, my only hunch is that PyTorch lighting is somehow blocking the outputs of the encoder to reach the decoder for cross-attention? As the outputs seem as though the decoder is just given the
[BOS]
token only for each input during inference.The task that I am demonstrating this issue on is:
I have had this problem occur on different tasks as well. Using WMT'14 English to German to demonstrate.
To reproduce
I have tried to simplify this down, but unfortunately, the example is still long. Sorry about that. Please let me know if something does not work.
If torchnlp is not installed:
pip install pytorch-nlp
If pytorch_lightning is not installed:pip install pytorch-lightning
Outputs of script demonstrating the issue
During training:
Output of encoder (to demonstrate that there is a difference per input):
Training reference labels:
Training predictions after
.generate()
and.batch_decode()
(garbage, but different per input):During validation:
Input IDs to encoder:
Output of encoder (to demonstrate that there is a difference per input):
Validation reference labels:
Validation predictions after
.generate()
and.batch_decode()
(garbage, but the same per input):Expected behavior
I would expect the model to generate a different output per input, as during training time.
Thank you for your help!
Hopefully, it is something simple that I am missing.