Closed alexisjihyeross closed 3 years ago
I guess it's quite normal that the quality degenerates the longer the sequence gets, especially since the output comes from generate()
...I don't really think that this poses a problem here
The model for some reason does not want to generate anything after
The reason you're seeing gibberish after 27 is that the model has already generated an EOS token (id == 1). At this point the model has said "I'm done generating. I think the sequences has ended". However, since you told it to use
If you don't tell it to use a different EOS token, then it will simply stop generating after hitting
I tried specifying that token id == 1 is a bad word so that the model won't generate it, but that also doesn't fix the problem.
Still, I do think it is quite odd that the model cannot generate more than 27 masked tokens.
Is it possible that this sort of task was only ever done as pretraining? So then the model would always have had teacher forcing, which means that it would never have to "predict" so many tokens into the future for this task.
If your goal is to fill in many blanks, then you could adapt in one of the following ways:
and then the model will generate the next tokens. at each point you would be using the next
-- edit -- I tried the second method and it does not seem to work. It might be because this is an encoder-decoder model, so we need to be seeding the decoder with each additional generation, rather than extending the input sequence. This is possible in the model's forward method but I don't know how to do it with generate.
This issue has been automatically marked as stale and been closed because it has not had recent activity. Thank you for your contributions.
If you think this still needs to be addressed please comment on this thread.
The issue
I am using a pretrained T5 model to generate missing spans as in the pretraining objective. However, I'm finding that these generations deteriorate for longer sequences (usually after around the 25th span or so). Below is an example of this deterioration on a sequence (from the IMDB dataset) where 15% of the tokens have been randomly masked with sentinel tokens. Given that the T5 model was pretrained using sequences of up to 512 tokens with 15% of tokens masked, shouldn't it be possible to obtain good generations on sequences like the one below? Why are generations like this one deteriorating? Thank you!
Environment info
transformers
version: 3.5.0Who can help
@patrickvonplaten
To reproduce
Steps to reproduce the behavior:
output: