ppo_trainer.generate w Bart have some error.

There seems to be a minor error in the generation of ppo_trainer

I tried switching from your ppo_trainer example to a different bartconditional generation model and it seemed to work fine until I checked the results. But I checked the logs and found that in generation they only returned one token. Code:

from transformers import AutoTokenizer
from trl import AutoModelForSeq2SeqLMWithValueHead
from trl import PPOConfig, PPOTrainer
from datasets import Dataset
# 1. load a pretrained model
model = AutoModelForSeq2SeqLMWithValueHead.from_pretrained("facebook/bart-large-cnn")
model_ref = AutoModelForSeq2SeqLMWithValueHead.from_pretrained("facebook/bart-large-cnn")
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")

test = {"input": ["question: What is the capital of France?","question: Who is the president of the USA?"], 
        "target": ["Paris", "Joe Biden"]}

def tokenize(sample):
    sample["input_ids"] = tokenizer.encode(sample["input"])
    sample["query"] = tokenizer.decode(sample["input_ids"])
    return sample

ds = Dataset.from_dict(test)
ds = ds.map(tokenize, batched=False)
ds.set_format(type="torch")
# 2. initialize a configu# 2. initialize trainer
def collator(data):
    return dict((key, [d[key] for d in data]) for key in data[0])
ppo_config = {"batch_size": 1,"mini_batch_size":1}
config = PPOConfig(**ppo_config)
ppo_trainer = PPOTrainer(config, model, model_ref, tokenizer,dataset = ds,data_collator=collator)

generation_kwargs = {
    # "min_length": ,
    "top_k": 0.0,
    "top_p": 1.0,
    "do_sample": True,
    # "pad_token_id": tokenizer.eos_token_id,
    "max_new_tokens": 512,
}
# Set the
# ppo_trainer.generate([ds['input_ids']])
for batch in ppo_trainer.dataloader:
    query_tensors = batch["input_ids"]

    # Get summary from summarizer 
    summary_tensors, ref_summary_tensors = ppo_trainer.generate(
        query_tensors, return_prompt=False, generate_ref_response=True, **generation_kwargs
    )
    print(summary_tensors)

Output:

Map: 100%|██████████| 2/2 [00:00<00:00, 521.55 examples/s] You're using a BartTokenizerFast tokenizer. Please note that with a fast tokenizer, using the __call__ method is faster than using a method to encode the text followed by a call to the pad method to get a padded encoding. [tensor([2], device='cuda:0')] [tensor([2], device='cuda:0')]

The reason I found is that the conditional statement in generation removes the latter part based on pad_token, and since BartModel starts with pad_token in deocder, all sequences generated in the latter part are deleted and returned.

https://github.com/huggingface/trl/blob/2a2676e7ecdb623d6748f8f77a91d519c3869d98/trl/trainer/ppo_trainer.py#L548

Therefore, I think we need to check whether the instance of pretrained_model in the generation function is Bart so that the generation does not return only one token. Please let me know if I missed anything.

huggingface / trl

ppo_trainer.generate w Bart have some error. #1362