T5 model seq2seq text generation using word embeddings instead of token_ids does not work

jerry3chen commented 3 years ago

Hi there,

I trained a MT5ForConditionalGeneration model. During training, I used my own embeddings for encoding (but default embeddings for decoding). However, when I try to generate output using generate function, it will give me an error message. I will post the code and error message in the following:

Here is the code for model training: outputs = self.encoder2(inputs_embeds=context, attention_mask=input_mask, labels=padded_labels) Where the context is similar to batch of token_ids but instead they are embeddings. The labels are target sequence token_ids. The training works fine without any issues.

And here is the line I tried to generate using the model: outputs = self.encoder2.generate(input_ids=None, inputs_embeds=context, attention_mask=input_mask, bos_token_id=0, pad_token_id=0, eos_token_id=1)

And once the program hits the above line, I will get the following error message:

outputs = self.encoder2.generate(input_ids=None, inputs_embeds=context, attention_mask=input_mask, bos_token_id=0, pad_token_id=0, eos_token_id=1) File "/scratch/jerryc/jerryc/venv_py3.7/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, *kwargs) File "/scratch/jerryc/jerryc/venv_py3.7/lib/python3.7/site-packages/transformers/generation_utils.py", line 913, in generate input_ids, decoder_start_token_id=decoder_start_token_id, bos_token_id=bos_token_id File "/scratch/jerryc/jerryc/venv_py3.7/lib/python3.7/site-packages/transformers/generation_utils.py", line 422, in _prepare_decoder_input_ids_for_generation torch.ones((input_ids.shape[0], 1), dtype=torch.long, device=input_ids.device) decoder_start_token_id AttributeError: 'NoneType' object has no attribute 'shape'

It seems the model is not handling this case property. Any help would be appreciated. Thanks

patrickvonplaten commented 3 years ago

Hey @jerry3chen,

Can you post a fully reproducible code snippet so that I can take a look? :-)

jerry3chen commented 3 years ago

Hi @patrickvonplaten,

I will post some more detailed codes. But this is downstream task so it is probably not ideal to have all of the code. I will just post down all of the parts that involve the t5model.

Here is where I initialized the t5 model enc2 = MT5ForConditionalGeneration.from_pretrained('google/mt5-small')

Then is it passed to a bigger model: model = Gat2Seq(enc,enc2,vocab.word2id('<pad>'),vocab.word2id('</s>')) class Gat2Seq(nn.Module): def __init__(self, encoder, encoder2, pad_idx, eos_idx, teacher_forcing = 0.5): super().__init__() self.encoder = encoder self.encoder2 = encoder2 During training, I have: context = self.encoder(graph, art_lengths) outputs = self.encoder2(inputs_embeds=context, attention_mask=input_mask, labels=padded_labels) Where context is the shape of [8, 50, 512] coming from previous encoder(8 is the batch size, 50 is the sentence max length, 512 is the embedding size default from mt5tokenizer). padded_labels has shape of [8, 20](8 is the batch size, 20 is the maximum target sequence length). It is batch of target sentence token_ids that I want the model to generate. I wanted the t5model to treated the context as embedded tokens and does it's own encode/decode for text generation. The training step works fine and I am able to see reasonable decrease in outputs.loss.

Finally when I have some trained models, I ran this time to generate text: outputs = self.encoder2.generate(input_ids=None, inputs_embeds=context, attention_mask=input_mask, bos_token_id=0, pad_token_id=0, eos_token_id=1) Where context here is exact the same as the one used in training.

However, I will get the following error when program hits the generation line:

File "pred.py", line 452, in main() File "pred.py", line 448, in main setup_predicting(model, data_loader, hps, vocab, f.split('/')[-1] + '_model_output.txt') File "pred.py", line 64, in setup_predicting run_predicting(model, data_loader, hps, vocab, save_f) File "pred.py", line 118, in run_predicting raise e File "pred.py", line 106, in run_predicting outputs = model.forward(G,lengths,labels,predicting=True) # [n_snodes, 2] File "/scratch/jerryc/jerryc/gat2seq/HeterSumGraph-master-mod-att-TV-char/HiGraphMod.py", line 470, in forward outputs = self.encoder2.generate(input_ids=None, inputs_embeds=context, attention_mask=input_mask, bos_token_id=0, pad_token_id=0, eos_token_id=1) File "/scratch/jerryc/jerryc/venv_py3.7/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, *kwargs) File "/scratch/jerryc/jerryc/venv_py3.7/lib/python3.7/site-packages/transformers/generation_utils.py", line 913, in generate input_ids, decoder_start_token_id=decoder_start_token_id, bos_token_id=bos_token_id File "/scratch/jerryc/jerryc/venv_py3.7/lib/python3.7/site-packages/transformers/generation_utils.py", line 422, in _prepare_decoder_input_ids_for_generation torch.ones((input_ids.shape[0], 1), dtype=torch.long, device=input_ids.device) decoder_start_token_id AttributeError: 'NoneType' object has no attribute 'shape'

Hope this is enough for you to diagnose the issue. Thanks, Jerry

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

yuto3o commented 3 years ago

hello, I face the same problem. Could you give me any suggestions?

patrickvonplaten commented 3 years ago

Hey @jerry3chen, @yuto3o,

Could you please provide a complete, but minimal reproducible code snippet, so that I can easily reproduce the bug?

Small non-executeable code snippets are not enough to efficiently debug the problem.

Thanks!

ichiroex commented 3 years ago

@patrickvonplaten @yuto3o @jerry3chen

Hello, I also face the same problem. However, I found that the error doesn't occur if I pass decoder_input_ids consisting of pad_token_id to the generate. The minimal reproducible code snippets are as follows.

My environment

transformers                  4.12.0
torch                         1.8.0

reproducible code for the error

from transformers import (
    T5ForConditionalGeneration,
    T5Tokenizer,
)
model = T5ForConditionalGeneration.from_pretrained("sonoisa/t5-base-japanese")
tokenizer = T5Tokenizer.from_pretrained("sonoisa/t5-base-japanese", is_fast=True)

# the example sentence is "It's sunny today" in English
tokenized_inputs = tokenizer(["今日は良い天気です"], return_tensors='pt')  

# create input embedding instead of passing input_ids
inputs_embeds = model.get_input_embeddings()(tokenized_inputs["input_ids"])

output_ids = model.generate(
    inputs_embeds=inputs_embeds,
    attention_mask=tokenized_inputs["attention_mask"]
)

AttributeError Traceback (most recent call last)
in 1 inputs_embeds = model.get_input_embeddings()(tokenized_inputs["input_ids"]) ----> 2 output_ids = model.generate( 3 inputs_embeds=inputs_embeds, 4 attention_mask=tokenized_inputs["attention_mask"] 5 ) ~/anaconda3/envs/aitd/lib/python3.8/site-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs) 25 def decorate_context(*args, **kwargs): 26 with self.__class__(): ---> 27 return func(*args, **kwargs) 28 return cast(F, decorate_context) 29 ~/anaconda3/envs/aitd/lib/python3.8/site-packages/transformers/generation_utils.py in generate(self, input_ids, max_length, min_length, do_sample, early_stopping, num_beams, temperature, top_k, top_p, repetition_penalty, bad_words_ids, bos_token_id, pad_token_id, eos_token_id, length_penalty, no_repeat_ngram_size, encoder_no_repeat_ngram_size, num_return_sequences, max_time, max_new_tokens, decoder_start_token_id, use_cache, num_beam_groups, diversity_penalty, prefix_allowed_tokens_fn, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, forced_bos_token_id, forced_eos_token_id, remove_invalid_values, synced_gpus, **model_kwargs) 911 input_ids = model_kwargs.pop("decoder_input_ids") 912 else: --> 913 input_ids = self._prepare_decoder_input_ids_for_generation( 914 input_ids, decoder_start_token_id=decoder_start_token_id, bos_token_id=bos_token_id 915 ) ~/anaconda3/envs/aitd/lib/python3.8/site-packages/transformers/generation_utils.py in _prepare_decoder_input_ids_for_generation(self, input_ids, decoder_start_token_id, bos_token_id) 422 decoder_start_token_id = self._get_decoder_start_token_id(decoder_start_token_id, bos_token_id) 423 decoder_input_ids = ( --> 424 torch.ones((input_ids.shape[0], 1), dtype=torch.long, device=input_ids.device) * decoder_start_token_id 425 ) 426 return decoder_input_ids AttributeError: 'NoneType' object has no attribute 'shape'

How to fix it

from transformers import (
    T5ForConditionalGeneration,
    T5Tokenizer,
)
model = T5ForConditionalGeneration.from_pretrained("sonoisa/t5-base-japanese")
tokenizer = T5Tokenizer.from_pretrained("sonoisa/t5-base-japanese", is_fast=True)

tokenized_inputs = tokenizer(["今日は良い天気です"], return_tensors='pt') # It's sunny today
inputs_embeds = model.get_input_embeddings()(tokenized_inputs["input_ids"])

# **NOTE**: pad_token_id is used as decoder_start_token_id
dummy_decoder_input_ids = torch.tensor([[tokenizer.pad_token_id]]) 

output_ids = model.generate(
    inputs_embeds=inputs_embeds,
    attention_mask=tokenized_inputs["attention_mask"],
    decoder_input_ids=dummy_decoder_input_ids
)

output_ids

tensor([[ 0, 32099, 876, 4, 5, 2262, 32098, 876, 4, 2262, 1]])

When I pass input_ids to generate

I can get the same result when I pass input_ids.

from transformers import (
    T5ForConditionalGeneration,
    T5Tokenizer,
)
model = T5ForConditionalGeneration.from_pretrained("sonoisa/t5-base-japanese")
tokenizer = T5Tokenizer.from_pretrained("sonoisa/t5-base-japanese", is_fast=True)

tokenized_inputs = tokenizer(["今日は良い天気です"], return_tensors='pt') # It's sunny today

output_ids = model.generate(
    input_ids=tokenized_inputs["input_ids"],
    attention_mask=tokenized_inputs["attention_mask"]
)

output_ids

tensor([[ 0, 32099, 876, 4, 5, 2262, 32098, 876, 4, 2262, 1]])

patrickvonplaten commented 2 years ago

@ichiroex,

Thanks for the nicely reproducible code snippet - this is indeed a bug and should be fixed.

patrickvonplaten commented 2 years ago

PR to fix this: #14443

ichiroex commented 2 years ago

@patrickvonplaten Thank you!!

huggingface / transformers

T5 model seq2seq text generation using word embeddings instead of token_ids does not work #12218

output_ids

output_ids