T5 generations for pretraining objective degenerate

alexisjihyeross commented 3 years ago

The issue

I am using a pretrained T5 model to generate missing spans as in the pretraining objective. However, I'm finding that these generations deteriorate for longer sequences (usually after around the 25th span or so). Below is an example of this deterioration on a sequence (from the IMDB dataset) where 15% of the tokens have been randomly masked with sentinel tokens. Given that the T5 model was pretrained using sequences of up to 512 tokens with 15% of tokens masked, shouldn't it be possible to obtain good generations on sequences like the one below? Why are generations like this one deteriorating? Thank you!

Environment info

transformers version: 3.5.0
Platform: Linux-4.15.0-45-generic-x86_64-with-debian-buster-sid
Python version: 3.7.7
PyTorch version (GPU?): 1.7.0 (True)
Tensorflow version (GPU?): not installed (NA)
Using GPU in script?: True

Who can help

@patrickvonplaten

To reproduce

Steps to reproduce the behavior:

import torch
from transformers import T5Tokenizer, T5ForConditionalGeneration

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model_name = "t5-base"
t5_tokenizer = T5Tokenizer.from_pretrained(model_name)
t5_model = T5ForConditionalGeneration.from_pretrained(model_name)
t5_model = t5_model.to(device)

original_sentence = "ROCK STAR is a well-told Hollywood-style rendition of the tale based on fact actually on how Ripper became Rob Halford's replacement for Judas Priest. Mark Wahlberg poured on his likable boy-ish charm and performed with believable admirably, something he has been known to do since the release of BOOGIE NIGHTS.  Stephen Herek, no stranger to musically-themed movies, takes the audience through the wonders of the breakneck lifestyle of an extinct species, the Hair-Metal Rock God. Wahlberg's 'Izzy' acts as the film's host plays the everyman who gets to see his wish come true. His likable character quickly wins over the heart of the viewer, who wants to see him succeed and gets the chance to give him the Metal 'goat horn' hand-sign several times over.  The only real complaint with the story is that the supporting cast, namely the other members of the band, were not fleshed out, or even introduced, properly. More interaction with these life-long Rock musicians would have amplified and solidified Izzy's new surroundings.   Naturally, ROCK STAR is filled with great music. Rabin's score, the Steel Dragon's original work and plenty of 80's-style Metal hits makes this soundtrack a must-have! Let's all hope that films like ROCK STAR not only give a credibility to a style of music that helped define a generation but also spark a very-needed revival.</s>"
sentence = "ROCK STAR is a well-told Hollywood-style rendition<extra_id_0> the tale <extra_id_1> on<extra_id_2> on<extra_id_3> Ri<extra_id_4> became Rob Hal<extra_id_5>'s replacement for Ju<extra_id_6> Priest.<extra_id_7> Wahlberg poured on his likable boy-ish charm and performed with believable admirably, something<extra_id_8>he has been known to do<extra_id_9> the release of BOOGIE NIGHTS. Stephen Herek<extra_id_10> no stranger to musically-themed<extra_id_11>, takes the<extra_id_12> through the<extra_id_13>s<extra_id_14> break<extra_id_15> of<extra_id_16> extinct<extra_id_17>, the<extra_id_18>-Metal Rock<extra_id_19> Wahlberg's 'Izzy' acts as the film's host plays the everyman who gets to see his wish come true. His<extra_id_20>likable character quickly<extra_id_21> the heart of the viewer, who<extra_id_22> to see him succeed and gets the chance to give him the Metal 'goat horn' hand-sign several times over<extra_id_23> The only real complaint with the<extra_id_24> is that the supporting<extra_id_25>,<extra_id_26>namely the other members of<extra_id_27> band, were not fleshed out, or even introduced, properly<extra_id_28> More interaction with these life-long<extra_id_29> musicians would have amplified and solidified Izzy's new surroundings<extra_id_30> Naturally,<extra_id_31>CK STAR is filled<extra_id_32> great music. Rabin's score, the Steel Dragon<extra_id_33>s original work<extra_id_34> of 80's-style Metal<extra_id_35> makes this soundtrack<extra_id_36>a must-have<extra_id_37> all hope that films like ROCK STAR not only give a credibility<extra_id_38> a style of music that helped define a generation but also spark a very-needed revival<extra_id_39></s>"
encoded = t5_tokenizer.encode(sentence)
print("original sentence: ", original_sentence)
print("\nmasked sentence: ", sentence)
print("\nnum tokens masked sentence: ", len(encoded))
encoded_tensor = torch.LongTensor(encoded).unsqueeze(0).to(device)
eos_token_id = t5_tokenizer.encode("<extra_id_40>")[0]
batch = t5_model.generate(encoded_tensor, early_stopping = True, max_length = 300, eos_token_id = eos_token_id, no_repeat_ngram_size = 2, num_beams = 1, num_return_sequences = 1)

for b in batch:
    print("\noutput: ")
    print(t5_tokenizer.decode(b, skip_special_tokens = False))

output:

original sentence: ROCK STAR is a well-told Hollywood-style rendition of the tale based on fact actually on how Ripper became Rob Halford's replacement for Judas Priest. Mark Wahlberg poured on his likable boy-ish charm and performed with believable admirably, something he has been known to do since the release of BOOGIE NIGHTS. Stephen Herek, no stranger to musically-themed movies, takes the audience through the wonders of the breakneck lifestyle of an extinct species, the Hair-Metal Rock God. Wahlberg's 'Izzy' acts as the film's host plays the everyman who gets to see his wish come true. His likable character quickly wins over the heart of the viewer, who wants to see him succeed and gets the chance to give him the Metal 'goat horn' hand-sign several times over. The only real complaint with the story is that the supporting cast, namely the other members of the band, were not fleshed out, or even introduced, properly. More interaction with these life-long Rock musicians would have amplified and solidified Izzy's new surroundings. Naturally, ROCK STAR is filled with great music. Rabin's score, the Steel Dragon's original work and plenty of 80's-style Metal hits makes this soundtrack a must-have! Let's all hope that films like ROCK STAR not only give a credibility to a style of music that helped define a generation but also spark a very-needed revival.

masked sentence: ROCK STAR is a well-told Hollywood-style rendition the tale on on Ri became Rob Hal's replacement for Ju Priest. Wahlberg poured on his likable boy-ish charm and performed with believable admirably, somethinghe has been known to do the release of BOOGIE NIGHTS. Stephen Herek no stranger to musically-themed, takes the through thes break of extinct, the-Metal Rock Wahlberg's 'Izzy' acts as the film's host plays the everyman who gets to see his wish come true. Hislikable character quickly the heart of the viewer, who to see him succeed and gets the chance to give him the Metal 'goat horn' hand-sign several times over The only real complaint with the is that the supporting,namely the other members of band, were not fleshed out, or even introduced, properly More interaction with these life-long musicians would have amplified and solidified Izzy's new surroundings Naturally,CK STAR is filled great music. Rabin's score, the Steel Dragons original work of 80's-style Metal makes this soundtracka must-have all hope that films like ROCK STAR not only give a credibility a style of music that helped define a generation but also spark a very-needed revival

num tokens masked sentence: 337

output:
of of the man who the day of his death themedas Steven that sinceo, films audience 'rockaga' of a-up the CK STAR band newest band. Steven incredibly captures is eager. film cast and the -Metal Rock, the band's sailor, and the members of Izzy' emcees, who were .-n " de an (s, in thetgrae and pro also le of'lr to ex si not on I" ensemble ⁇ » be last for fiia =/ den? pour as --) 2 $: + 1 S un former dis spa tub root will at both second is no bout muscle hard desre baseball facialw mi& * [...;

patrickvonplaten commented 3 years ago

I guess it's quite normal that the quality degenerates the longer the sequence gets, especially since the output comes from generate()...I don't really think that this poses a problem here

jsrozner commented 3 years ago

The model for some reason does not want to generate anything after , and the same behavior (i.e. problems after 27) occurs for t5-small.

The reason you're seeing gibberish after 27 is that the model has already generated an EOS token (id == 1). At this point the model has said "I'm done generating. I think the sequences has ended". However, since you told it to use as EOS, it continues to try to produce tokens even after producing token id == 1. But the model doesn't know what to do after creating an EOS so you get gibberish.

If you don't tell it to use a different EOS token, then it will simply stop generating after hitting .

I tried specifying that token id == 1 is a bad word so that the model won't generate it, but that also doesn't fix the problem.

Still, I do think it is quite odd that the model cannot generate more than 27 masked tokens.

Is it possible that this sort of task was only ever done as pretraining? So then the model would always have had teacher forcing, which means that it would never have to "predict" so many tokens into the future for this task.

If your goal is to fill in many blanks, then you could adapt in one of the following ways:

masking one token at a time in the full sentence
starting with the long input sentence and then appending the correct outputs up to the current extra_id. So e.g. for extra_id_2 you would have (+/- the trailing token) : long_input_sentence <\s> <eid 0> of <eid 1> based <eid 2>

and then the model will generate the next tokens. at each point you would be using the next as the stopping token.

-- edit -- I tried the second method and it does not seem to work. It might be because this is an encoder-decoder model, so we need to be seeding the decoder with each additional generation, rather than extending the input sequence. This is possible in the model's forward method but I don't know how to do it with generate.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale and been closed because it has not had recent activity. Thank you for your contributions.

If you think this still needs to be addressed please comment on this thread.

huggingface / transformers