There seems to be a minor error in the generation of ppo_trainer
I tried switching from your ppo_trainer example to a different bartconditional generation model and it seemed to work fine until I checked the results.
But I checked the logs and found that in generation they only returned one token.
Code:
from transformers import AutoTokenizer
from trl import AutoModelForSeq2SeqLMWithValueHead
from trl import PPOConfig, PPOTrainer
from datasets import Dataset
# 1. load a pretrained model
model = AutoModelForSeq2SeqLMWithValueHead.from_pretrained("facebook/bart-large-cnn")
model_ref = AutoModelForSeq2SeqLMWithValueHead.from_pretrained("facebook/bart-large-cnn")
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")
test = {"input": ["question: What is the capital of France?","question: Who is the president of the USA?"],
"target": ["Paris", "Joe Biden"]}
def tokenize(sample):
sample["input_ids"] = tokenizer.encode(sample["input"])
sample["query"] = tokenizer.decode(sample["input_ids"])
return sample
ds = Dataset.from_dict(test)
ds = ds.map(tokenize, batched=False)
ds.set_format(type="torch")
# 2. initialize a configu# 2. initialize trainer
def collator(data):
return dict((key, [d[key] for d in data]) for key in data[0])
ppo_config = {"batch_size": 1,"mini_batch_size":1}
config = PPOConfig(**ppo_config)
ppo_trainer = PPOTrainer(config, model, model_ref, tokenizer,dataset = ds,data_collator=collator)
generation_kwargs = {
# "min_length": ,
"top_k": 0.0,
"top_p": 1.0,
"do_sample": True,
# "pad_token_id": tokenizer.eos_token_id,
"max_new_tokens": 512,
}
# Set the
# ppo_trainer.generate([ds['input_ids']])
for batch in ppo_trainer.dataloader:
query_tensors = batch["input_ids"]
# Get summary from summarizer
summary_tensors, ref_summary_tensors = ppo_trainer.generate(
query_tensors, return_prompt=False, generate_ref_response=True, **generation_kwargs
)
print(summary_tensors)
Output:
Map: 100%|██████████| 2/2 [00:00<00:00, 521.55 examples/s]
You're using a BartTokenizerFast tokenizer. Please note that with a fast tokenizer, using the __call__ method is faster than using a method to encode the text followed by a call to the pad method to get a padded encoding.
[tensor([2], device='cuda:0')]
[tensor([2], device='cuda:0')]
The reason I found is that the conditional statement in generation removes the latter part based on pad_token, and since BartModel starts with pad_token in deocder, all sequences generated in the latter part are deleted and returned.
Therefore, I think we need to check whether the instance of pretrained_model in the generation function is Bart so that the generation does not return only one token.
Please let me know if I missed anything.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
There seems to be a minor error in the generation of ppo_trainer
I tried switching from your ppo_trainer example to a different bartconditional generation model and it seemed to work fine until I checked the results. But I checked the logs and found that in generation they only returned one token. Code:
Output:
The reason I found is that the conditional statement in generation removes the latter part based on pad_token, and since BartModel starts with pad_token in deocder, all sequences generated in the latter part are deleted and returned.
https://github.com/huggingface/trl/blob/2a2676e7ecdb623d6748f8f77a91d519c3869d98/trl/trainer/ppo_trainer.py#L548
Therefore, I think we need to check whether the instance of pretrained_model in the generation function is Bart so that the generation does not return only one token. Please let me know if I missed anything.