huggingface / trl

Train transformer language models with reinforcement learning.
http://hf.co/docs/trl
Apache License 2.0
9.32k stars 1.17k forks source link

[StackLLaMA] Problems running reward_modeling.py using gpt2 as base for reward model #356

Closed samuelhoglund closed 1 year ago

samuelhoglund commented 1 year ago

Hello!

I am trying to get the reward_modeling.py file to work on a smaller scale by using gpt2 as a reward model.

The only changes I made to the file from its current version in the repo was to make the subsets for the data smaller, setting these values instead:

train_subset: Optional[int] = field(
        default=1000,
        metadata={"help": "The size of the subset of the training data to use"},
    )
    eval_subset: Optional[int] = field(
        default=500,
        metadata={"help": "The size of the subset of the eval data to use"},
    )

(Otherwise these are set to 100K and 50K, respectively.)

As well as retrieving a modified, smaller sample of the stack-exchange dataset consisting of one file instead of 12 or 20:

train_dataset = load_dataset("samhog/stack-exchange-mini", data_dir="data/reward", split="train[:1%]")
eval_dataset = load_dataset("samhog/stack-exchange-mini", data_dir="data/evaluation", split="train[:1%]")

However, when running the script, training fails. This is the error with the whole traceback included:

╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /content/trl/examples/stack_llama/scripts/reward_modeling.py:285 in <module> │
│                                                                              │
│   282 │   data_collator=RewardDataCollatorWithPadding(tokenizer=tokenizer, m │
│   283 )                                                                      │
│   284                                                                        │
│ ❱ 285 trainer.train(script_args.resume_from_checkpoint)                      │
│   286                                                                        │
│   287 print("Saving last checkpoint of the model")                           │
│   288 model.save_pretrained(output_name + "_peft_last_checkpoint")           │
│                                                                              │
│ /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:1662 in      │
│ train                                                                        │
│                                                                              │
│   1659 │   │   inner_training_loop = find_executable_batch_size(             │
│   1660 │   │   │   self._inner_training_loop, self._train_batch_size, args.a │
│   1661 │   │   )                                                             │
│ ❱ 1662 │   │   return inner_training_loop(                                   │
│   1663 │   │   │   args=args,                                                │
│   1664 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1665 │   │   │   trial=trial,                                              │
│                                                                              │
│ /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:1929 in      │
│ _inner_training_loop                                                         │
│                                                                              │
│   1926 │   │   │   │   │   with model.no_sync():                             │
│   1927 │   │   │   │   │   │   tr_loss_step = self.training_step(model, inpu │
│   1928 │   │   │   │   else:                                                 │
│ ❱ 1929 │   │   │   │   │   tr_loss_step = self.training_step(model, inputs)  │
│   1930 │   │   │   │                                                         │
│   1931 │   │   │   │   if (                                                  │
│   1932 │   │   │   │   │   args.logging_nan_inf_filter                       │
│                                                                              │
│ /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:2717 in      │
│ training_step                                                                │
│                                                                              │
│   2714 │   │   │   # loss gets scaled under gradient_accumulation_steps in d │
│   2715 │   │   │   loss = self.deepspeed.backward(loss)                      │
│   2716 │   │   else:                                                         │
│ ❱ 2717 │   │   │   loss.backward()                                           │
│   2718 │   │                                                                 │
│   2719 │   │   return loss.detach()                                          │
│   2720                                                                       │
│                                                                              │
│ /usr/local/lib/python3.10/dist-packages/torch/_tensor.py:487 in backward     │
│                                                                              │
│    484 │   │   │   │   create_graph=create_graph,                            │
│    485 │   │   │   │   inputs=inputs,                                        │
│    486 │   │   │   )                                                         │
│ ❱  487 │   │   torch.autograd.backward(                                      │
│    488 │   │   │   self, gradient, retain_graph, create_graph, inputs=inputs │
│    489 │   │   )                                                             │
│    490                                                                       │
│                                                                              │
│ /usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py:200 in    │
│ backward                                                                     │
│                                                                              │
│   197 │   # The reason we repeat same the comment below is that              │
│   198 │   # some Python versions print out the first line of a multi-line fu │
│   199 │   # calls in the traceback and some print out the last line          │
│ ❱ 200 │   Variable._execution_engine.run_backward(  # Calls into the C++ eng │
│   201 │   │   tensors, grad_tensors_, retain_graph, create_graph, inputs,    │
│   202 │   │   allow_unreachable=True, accumulate_grad=True)  # Calls into th │
│   203                                                                        │
╰──────────────────────────────────────────────────────────────────────────────╯
RuntimeError: one of the variables needed for gradient computation has been 
modified by an inplace operation: [CUDABoolType [1, 1, 264, 264]] is at version 
3; expected version 2 instead. Hint: the backtrace further above shows the 
operation that failed to compute its gradient. The variable in question was 
changed in there or anywhere later. Good luck!

Does anyone have any tips on how to proceed?

Thanks in advance!

mnoukhov commented 1 year ago

Could be related to the comment in https://github.com/lvwerra/trl/blob/main/examples/stack_llama/scripts/rl_training.py#L43

Have you tried GPT-Neo models?

dayL-W commented 1 year ago

same error

oliu-io commented 1 year ago

Here's a potential workaround https://github.com/lvwerra/trl/issues/274#issuecomment-1562135869

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.