Closed Wovchena closed 11 months ago
Hey, both padding tokens are not non-printable
you just decided to skip them using skip_special_tokens = True
.
You should try to set the padding to the left as it is recommended for generation and the padding token on the right will always impact the model. more details here
Hi. skip_special_tokens=True
is intentional. How do I modify the reproducer to set the padding to the left?
tokenizer.padding_side = "left"
The results are still different:
import transformers
tokenizer = transformers.LlamaTokenizer.from_pretrained('TinyLlama/TinyLlama-1.1B-Chat-v0.6')
tokenizer.padding_side = "left"
input_ids = tokenizer('Hi', return_tensors='pt')['input_ids']
model = transformers.LlamaForCausalLM.from_pretrained('TinyLlama/TinyLlama-1.1B-Chat-v0.6')
print(model.generation_config.eos_token_id)
pad_token_id = 0
assert pad_token_id != model.generation_config.eos_token_id
zero = [tokenizer.decode(beam, skip_special_tokens=True) for beam in model.generate(input_ids, max_new_tokens=25, num_beam_groups=9, num_beams=99, num_return_sequences=99, diversity_penalty=1.0, no_repeat_ngram_size=3, do_sample=False, pad_token_id=pad_token_id)]
print(zero)
pad_token_id = 1
assert pad_token_id != model.generation_config.eos_token_id
one = [tokenizer.decode(beam, skip_special_tokens=True) for beam in model.generate(input_ids, max_new_tokens=25, num_beam_groups=9, num_beams=99, num_return_sequences=99, diversity_penalty=1.0, no_repeat_ngram_size=3, do_sample=False, pad_token_id=pad_token_id)]
print(one)
assert zero == one
My explanation why your suggestion didn't help is that the problem I'm referring to lies inside of model.generate()
, which doesn't involve tokenizer
.
Hi @Wovchena š
The issue with your script was that you were not passing the attention mask to generate
:)
Throughout transformers, we do our best effort to infer the attention mask when it is not passed: if the token is equal to the pad token the attention mask is 0 (and 1 otherwise). In your particular example, you were setting the pad token to the bos token (both 1
), the token that the tokenizer uses to signal the beginning of the sequence. As such, the inferred attention mask was different in the two inputs, leading to a different output.
We always recommend passing the attention mask š¤
Working example (passing the attention mask):
import transformers
tokenizer = transformers.LlamaTokenizer.from_pretrained('TinyLlama/TinyLlama-1.1B-Chat-v0.6')
model = transformers.LlamaForCausalLM.from_pretrained('TinyLlama/TinyLlama-1.1B-Chat-v0.6')
tokenizer.padding_side = "left"
input_ids = tokenizer('Hi', return_tensors='pt')
print(model.generation_config.eos_token_id)
pad_token_id = 0
assert pad_token_id != model.generation_config.eos_token_id
gen_zero = model.generate(**input_ids, max_new_tokens=25, num_beam_groups=9, num_beams=99, num_return_sequences=99, diversity_penalty=1.0, no_repeat_ngram_size=3, do_sample=False, pad_token_id=pad_token_id)
zero = [tokenizer.decode(beam, skip_special_tokens=True) for beam in gen_zero]
pad_token_id = 1
assert pad_token_id != model.generation_config.eos_token_id
gen_one = model.generate(**input_ids, max_new_tokens=25, num_beam_groups=9, num_beams=99, num_return_sequences=99, diversity_penalty=1.0, no_repeat_ngram_size=3, do_sample=False, pad_token_id=pad_token_id)
one = [tokenizer.decode(beam, skip_special_tokens=True) for beam in gen_one]
assert zero == one
Thank you!
System Info
transformers Version: 4.35.2 Windows and Ubuntu20 Python 3.11.3
Who can help?
@gante
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
Using a non printable
pad_token_id
should result in same text generation.When a group is completed, group beam search keeps padding ongoing beams from this group. That in turn affects, how diversity penalty is applied to tokens for other groups. And thus different tokens are chosen.
I believe a completed group should not affect log probabilities for other groups.