Closed bobshih closed 4 years ago
Hmm, this will be hard to debug here. I'm currently working on getting a working example of a Bert2Bert model, so I will keep an eye on encoder_output
bugs!
See conversation here: https://github.com/huggingface/transformers/issues/4443#issuecomment-656691026
Thank you for your reply. I am looking forward your Bert2Bert example. And I hope we can solve this problem.
Hey @bobshih,
Training a Bert2Bert model worked out fine for me - I did not experience any bugs related to encoder_outputs
.
You can check out the model and all the code to reproduce the results here:
https://huggingface.co/patrickvonplaten/bert2bert-cnn_dailymail-fp16
Maybe you can take a look, adapt your code and see whether the error persists :-)
OK, thank for your attention. I will adapt my code after finishing my work at hand.
Hi, @patrickvonplaten, I have trained EncoderDecoderModel with your training example script. I noticed that if there are too many padding tokens in training data, it will make the trained model produce the same vectors despite the different inputs. but I wonder why attention mask does not work? In my original training setting, there are 93% padding tokens. After I reduce the max length and make padding tokens decrease to 21%, the encoderdecoder model works without problems.
This line:
https://huggingface.co/patrickvonplaten/bert2bert-cnn_dailymail-fp16#training-script:
batch["labels"] = [
[-100 if token == tokenizer.pad_token_id else token for token in labels] for labels in batch["labels"]
]
in the preprocessing should make sure that the PAD token does not influence the loss and thus also not the model.
This line:
https://huggingface.co/patrickvonplaten/bert2bert-cnn_dailymail-fp16#training-script:
batch["labels"] = [ [-100 if token == tokenizer.pad_token_id else token for token in labels] for labels in batch["labels"] ]
in the preprocessing should make sure that the PAD token does not influence the loss and thus also not the model.
Yes, I understand what you mention, and I also use this setting for models after adapting my script, but the problem shows again. I will train the model again with this setting in the weekend. And I hope there will be a different result. Again, thank you very much for solving the problem and patience.
❓ Questions & Help
Details
HI,
I've trained a bert2bert model to generate answers with different questions. But after training, the bert2bert model always produces the same encoder_outputs with different inputs. Does anyone know how to fix or avoid the problem? If I dont resize the bert's embedding size, will this solve the problem?
Thanks in advance.
The problem arises when using:
The tasks I am working on is:
Environment info
transformers
version: 2.11.0Below is my training code. The inputs are turned to indices by tokenizer.encode_plus
Besides, for each time step, encoder_outputs are the same, like the picture below. I think it's very strange. I am not sure if they are the same problems.