There seems to be a minor error in the generation of ppo_trainer
I tried switching from your ppo_trainer example to a different bartconditional generation model and it seemed to work fine until I checked the results.
But I checked the logs and found that in generation they only returned one token.
from transformers import AutoTokenizer
from trl import AutoModelForSeq2SeqLMWithValueHead
from trl import PPOConfig, PPOTrainer
from datasets import Dataset
# 1. load a pretrained model
model = AutoModelForSeq2SeqLMWithValueHead.from_pretrained("facebook/bart-large-cnn")
model_ref = AutoModelForSeq2SeqLMWithValueHead.from_pretrained("facebook/bart-large-cnn")
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")
test = {"input": ["question: What is the capital of France?","question: Who is the president of the USA?"],
"target": ["Paris", "Joe Biden"]}
def tokenize(sample):
sample["input_ids"] = tokenizer.encode(sample["input"])
sample["query"] = tokenizer.decode(sample["input_ids"])
return sample
ds = Dataset.from_dict(test)
ds =, batched=False)
# 2. initialize a configu# 2. initialize trainer
def collator(data):
return dict((key, [d[key] for d in data]) for key in data[0])
ppo_config = {"batch_size": 1,"mini_batch_size":1}
config = PPOConfig(**ppo_config)
ppo_trainer = PPOTrainer(config, model, model_ref, tokenizer,dataset = ds,data_collator=collator)
generation_kwargs = {
# "min_length": ,
"top_k": 0.0,
"top_p": 1.0,
"do_sample": True,
# "pad_token_id": tokenizer.eos_token_id,
"max_new_tokens": 512,
# Set the
# ppo_trainer.generate([ds['input_ids']])
for batch in ppo_trainer.dataloader:
query_tensors = batch["input_ids"]
# Get summary from summarizer
summary_tensors, ref_summary_tensors = ppo_trainer.generate(
query_tensors, return_prompt=False, generate_ref_response=True, **generation_kwargs
Map: 100%|██████████| 2/2 [00:00<00:00, 521.55 examples/s]
You're using a BartTokenizerFast tokenizer. Please note that with a fast tokenizer, using the __call__ method is faster than using a method to encode the text followed by a call to the pad method to get a padded encoding.
[tensor([2], device='cuda:0')]
[tensor([2], device='cuda:0')]
The reason I found is that the conditional statement in generation removes the latter part based on pad_token, and since BartModel starts with pad_token in deocder, all sequences generated in the latter part are deleted and returned.
Therefore, I think we need to check whether the instance of pretrained_model in the generation function is Bart so that the generation does not return only one token.
Please let me know if I missed anything.
