huggingface / trl

Train transformer language models with reinforcement learning.
http://hf.co/docs/trl
Apache License 2.0
9.6k stars 1.2k forks source link

DPOTrainer: AttributeError: 'list' object has no attribute 'numel' #1737

Closed tahirahmad2030 closed 2 months ago

tahirahmad2030 commented 3 months ago

Transformers: 4.41.2 trl: 0.9.4 torch: Version: 2.3.0+cu121

I am training a simple translation model using DPO Trainer and the code is below:

from datasets import Dataset
from transformers import Trainer, TrainingArguments, AutoTokenizer, AutoModelForSeq2SeqLM
from torch.nn.functional import cross_entropy
import torch

# Load pre-trained T5 tokenizer and model
model_name = "t5-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

from trl import DPOConfig, DPOTrainer
# Example tokenized dataset (replace with your actual dataset)
tokenized_dataset = Dataset.from_dict({
    'prompt': ["Translate to French: The house is beautiful.", "Translate to French: The cat sat on the mat."],
    'chosen': ["La maison est belle.", "Le chat était sur le tapis."],
    'rejected': ["La maison est laide.", "Le chat a mangé le tapis."],
})

# Preprocess function
def preprocess_function(examples):
    inputs = examples['prompt']
    chosen_targets = examples['chosen']
    rejected_targets = examples['rejected']

    model_inputs = tokenizer(inputs, max_length=512, truncation=True, padding="max_length")
    model_inputs["labels"] = tokenizer(chosen_targets, max_length=512, truncation=True, padding="max_length").input_ids
    model_inputs["chosen_labels"] = tokenizer(chosen_targets, max_length=512, truncation=True, padding="max_length").input_ids
    model_inputs["rejected_labels"] = tokenizer(rejected_targets, max_length=512, truncation=True, padding="max_length").input_ids

    return model_inputs

# Tokenize the dataset
tokenized_dataset = tokenized_dataset.map(preprocess_function, batched=False)

# Training arguments
training_args = DPOConfig(
    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=4,
    num_train_epochs=3,
    logging_dir="./logs",
)

# Initialize Trainer
trainer = DPOTrainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    tokenizer=tokenizer,
)

# Train the model
trainer.train()

The error:

AttributeError                            Traceback (most recent call last)
<ipython-input-24-a566970b894e> in <cell line: 64>()
     62 
     63 # Train the model
---> 64 trainer.train()

4 frames
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py in estimate_tokens(self, input_dict)
   1186             self.warnings_issued = {}
   1187         if self.main_input_name in input_dict:
-> 1188             return input_dict[self.main_input_name].numel()
   1189         elif "estimate_tokens" not in self.warnings_issued:
   1190             logger.warning(

AttributeError: 'list' object has no attribute 'numel'

I tried different envs like sagemaker and google colab but the error persists.

b11z commented 3 months ago

I don't know the fix, but I have worked around by doing:

model.floating_point_ops = lambda s: 0

More detail: I was passing data to DPO as plain Python objects, not tensors, since that's what DPO expects. But the floating_point_ops method of some models expects tensors. Since this method was only used for monitoring, I just replaced it with a noop.

tahirahmad2030 commented 3 months ago

Thanks for the work around @b11z .

github-actions[bot] commented 2 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.