huggingface / trl

Train transformer language models with reinforcement learning.
http://hf.co/docs/trl
Apache License 2.0
9.28k stars 1.16k forks source link

Setting the `dataset_num_proc>1` process on `DPOTrainer` seems to block. #1964

Closed zhangzef closed 2 weeks ago

zhangzef commented 3 weeks ago

When I set up just the dataset_num_proc=2 process in DPOTrainer, it seemed to completely pause the step of initializing the trainer map dataset, even though my dataset only had two data points and my cpu and memory utilization didn't seem to increase at all, my code:

training_args = DPOConfig(
    output_dir="./results",
    num_train_epochs=1,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=1,
    learning_rate=5e-5,
    fp16=True,
    logging_steps=10,
    optim="adamw_torch",
    evaluation_strategy="steps",
    save_strategy="steps",
    eval_steps=50,
    save_steps=50,
    warmup_steps=100,
    max_grad_norm=0.3,
    lr_scheduler_type="cosine",
    dataset_num_proc = 2,
)

peft_model = get_peft_model(model, lora_config)

dpo_trainer = DPOTrainer(
    model=peft_model,
    args=training_args,
    beta=0.1,
    train_dataset=train_dataset,
    tokenizer=tokenizer,
    max_prompt_length=512,
    max_length=1024
)

the log is:

/root/miniconda3/envs/llava/lib/python3.10/site-packages/transformers/models/llava/configuration_llava.py:100: FutureWarning: The `vocab_size` argument is deprecated and will be removed in v4.42, since it can be inferred from the `text_config`. Passing this argument has no effect
  warnings.warn(
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.44it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading dataset shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 26/26 [00:00<00:00, 9376.78it/s]
num_proc must be <= 2. Reducing num_proc to 2 for dataset of size 2.
/root/miniconda3/envs/llava/lib/python3.10/site-packages/transformers/training_args.py:1474: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead
  warnings.warn(
/root/miniconda3/envs/llava/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py:100: FutureWarning: Deprecated argument(s) used in '__init__': max_prompt_length, max_length. Will not be supported from version '1.0.0'.

Deprecated positional argument(s) used in DPOTrainer, please use the DPOConfig to set these arguments instead.
  warnings.warn(message, FutureWarning)
/root/miniconda3/envs/llava/lib/python3.10/site-packages/trl/trainer/dpo_trainer.py:387: UserWarning: You passed `max_length` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/root/miniconda3/envs/llava/lib/python3.10/site-packages/trl/trainer/dpo_trainer.py:400: UserWarning: You passed `max_prompt_length` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/root/miniconda3/envs/llava/lib/python3.10/site-packages/trl/trainer/dpo_trainer.py:440: UserWarning: When using DPODataCollatorWithPadding, you should set `remove_unused_columns=False` in your TrainingArguments we have set it for you, but you should do it yourself in the future.
  warnings.warn(
mapping dataset
Map (num_proc=2):   0%|                                                                                                                                                        | 0/2 [00:00<?, ? examples/s]

It doesn't seem to start map no matter how long it waits.

zhangzef commented 3 weeks ago

here are my packages' version:

torch                     2.4.0
tqdm                      4.66.4
tokenizers                0.19.1
trl                       0.9.6.dev0
transformers              4.41.2
python                    3.10.14
zhangzef commented 2 weeks ago

I found that if there is a model processor in the processing of the map function, dataset_num_proc>1 will cause the program to pause.

kashif commented 2 weeks ago

see the fix in #1914

zhangzef commented 2 weeks ago

see the fix in #1914

@kashif thank you for your reply! can you tell me why and how to fix it? thanks!

kashif commented 2 weeks ago

it should be fixed now... can you kindly test?

zhangzef commented 2 weeks ago

should I update the newest version?

kashif commented 2 weeks ago

yes

zhangzef commented 2 weeks ago

I updated the trl package to the latest version and this problem still occurs. This problem may have nothing to do with dpotrainer. It seems that the program will get stuck when the map method includes the preprocessing of data by the processor.

just like this:

model_name = "./model_para/llava-1.5-7b-hf"
model = LlavaForConditionalGeneration.from_pretrained(model_name, torch_dtype=torch.float16)
tokenizer = LlavaProcessor.from_pretrained(model_name)

def preprocess_function(examples):
    images = [img.convert("RGB") for img in examples["image"]]
    prompts = [f"<image>\nQuestion: {q}\nAnswer:" for q in examples["question"]]
    chosen = examples["chosen"]
    rejected = examples["rejected"]

    inputs = tokenizer(prompts, images=images)

    return {
        "images": images,
        "prompt": prompts,
        "chosen": chosen,
        "rejected": rejected
    }

dataset = load_dataset('./datasets/RLAIF-V-Dataset')["train"].select(range(256))
train_dataset = dataset.map(
    preprocess_function,
    batched=True,
    batch_size=128,
    num_proc=os.cpu_count(),
    )
zhangzef commented 2 weeks ago

I don't know if this is only on the llava processor, I haven't tested it on any other model.

kashif commented 2 weeks ago

can you try:

model_name = "./model_para/llava-1.5-7b-hf"
model = LlavaForConditionalGeneration.from_pretrained(model_name, torch_dtype=torch.float16)
tokenizer = LlavaProcessor.from_pretrained(model_name)

def preprocess_function(examples, tokenizer):
    images = [img.convert("RGB") for img in examples["image"]]
    prompts = [f"<image>\nQuestion: {q}\nAnswer:" for q in examples["question"]]
    chosen = examples["chosen"]
    rejected = examples["rejected"]

    inputs = tokenizer(prompts, images=images)

    return {
        "images": images,
        "prompt": prompts,
        "chosen": chosen,
        "rejected": rejected
    }

dataset = load_dataset('./datasets/RLAIF-V-Dataset')["train"].select(range(256))
train_dataset = dataset.map(
    preprocess_function,
    fn_kwargs={"tokenizer": tokenizer},
    batched=True,
    batch_size=128,
    num_proc=os.cpu_count(),
)
kashif commented 2 weeks ago

also note some visual-LM processors do not support batch processing

zhangzef commented 2 weeks ago

It didn't seem to work, but thanks to your reminder, I found out the real reason was the llava processor, which worked fine when I replaced the processor with the CLIP processor.