huggingface / trl

Train transformer language models with reinforcement learning.
http://hf.co/docs/trl
Apache License 2.0
10.03k stars 1.27k forks source link

ppo_trainer.generate w Bart have some error. #1362

Closed BooMinSeong closed 7 months ago

BooMinSeong commented 8 months ago

There seems to be a minor error in the generation of ppo_trainer

I tried switching from your ppo_trainer example to a different bartconditional generation model and it seemed to work fine until I checked the results. But I checked the logs and found that in generation they only returned one token. Code:

from transformers import AutoTokenizer
from trl import AutoModelForSeq2SeqLMWithValueHead
from trl import PPOConfig, PPOTrainer
from datasets import Dataset
# 1. load a pretrained model
model = AutoModelForSeq2SeqLMWithValueHead.from_pretrained("facebook/bart-large-cnn")
model_ref = AutoModelForSeq2SeqLMWithValueHead.from_pretrained("facebook/bart-large-cnn")
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")

test = {"input": ["question: What is the capital of France?","question: Who is the president of the USA?"], 
        "target": ["Paris", "Joe Biden"]}

def tokenize(sample):
    sample["input_ids"] = tokenizer.encode(sample["input"])
    sample["query"] = tokenizer.decode(sample["input_ids"])
    return sample

ds = Dataset.from_dict(test)
ds = ds.map(tokenize, batched=False)
ds.set_format(type="torch")
# 2. initialize a configu# 2. initialize trainer
def collator(data):
    return dict((key, [d[key] for d in data]) for key in data[0])
ppo_config = {"batch_size": 1,"mini_batch_size":1}
config = PPOConfig(**ppo_config)
ppo_trainer = PPOTrainer(config, model, model_ref, tokenizer,dataset = ds,data_collator=collator)

generation_kwargs = {
    # "min_length": ,
    "top_k": 0.0,
    "top_p": 1.0,
    "do_sample": True,
    # "pad_token_id": tokenizer.eos_token_id,
    "max_new_tokens": 512,
}
# Set the
# ppo_trainer.generate([ds['input_ids']])
for batch in ppo_trainer.dataloader:
    query_tensors = batch["input_ids"]

    # Get summary from summarizer 
    summary_tensors, ref_summary_tensors = ppo_trainer.generate(
        query_tensors, return_prompt=False, generate_ref_response=True, **generation_kwargs
    )
    print(summary_tensors)

Output:

Map: 100%|██████████| 2/2 [00:00<00:00, 521.55 examples/s] You're using a BartTokenizerFast tokenizer. Please note that with a fast tokenizer, using the __call__ method is faster than using a method to encode the text followed by a call to the pad method to get a padded encoding. [tensor([2], device='cuda:0')] [tensor([2], device='cuda:0')]

The reason I found is that the conditional statement in generation removes the latter part based on pad_token, and since BartModel starts with pad_token in deocder, all sequences generated in the latter part are deleted and returned.

https://github.com/huggingface/trl/blob/2a2676e7ecdb623d6748f8f77a91d519c3869d98/trl/trainer/ppo_trainer.py#L548

Therefore, I think we need to check whether the instance of pretrained_model in the generation function is Bart so that the generation does not return only one token. Please let me know if I missed anything.

github-actions[bot] commented 7 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.