huggingface / trl

Train transformer language models with reinforcement learning.
http://hf.co/docs/trl
Apache License 2.0
9.32k stars 1.17k forks source link

"Cannot handle batch sizes > 1 if no padding token is defined." when running StackLLAMA example with Pythia-based reward model #422

Closed jvhoffbauer closed 1 year ago

jvhoffbauer commented 1 year ago

When I use EleutherAI/pythia-160M as base-model for my reward model, then running the actual training script fails with the following error. This seems to be caused by GPTNeoXForSequenceClassification requiring a pad token when running in batched mode. Note that I have a reward model (reward_model) that was trained using the examples script and merged with the peft adapter.

There is a quick fix by running the following code after initialising the sentiment prediction pipeline to ensure we have the pad token set.

sentiment_pipe.tokenizer.pad_token = sentiment_pipe.tokenizer.eos_token
sentiment_pipe.model.config.pad_token_id = sentiment_pipe.model.config.eos_token_id

However - I wanted to cross check if this is expected behaviour and the fix is correct.

The full stack trace:

Some weights of the model checkpoint at /scratch1/jhoff/checkpoints/reward_model were not used when initializing GPTNeoXForSequenceClassification: ['embed_out.weight']
- This IS expected if you are initializing GPTNeoXForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing GPTNeoXForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of GPTNeoXForSequenceClassification were not initialized from the model checkpoint at /scratch1/jhoff/checkpoints/reward_model and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.
0it [00:00, ?it/s]You're using a GPTNeoXTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
/home/jhoffbauer/project/venv/lib/python3.10/site-packages/transformers/generation/utils.py:1255: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
  warnings.warn(
/home/jhoffbauer/project/venv/lib/python3.10/site-packages/transformers/pipelines/text_classification.py:104: UserWarning: `return_all_scores` is now deprecated,  if want a similar funcionality use `top_k=None` instead of `return_all_scores=True` or `top_k=1` instead of `return_all_scores=False`.
  warnings.warn(
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
0it [00:09, ?it/s]
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/jhoffbauer/.vscode-server/extensions/ms-python.python-2023.8.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
    cli.main()
  File "/home/jhoffbauer/.vscode-server/extensions/ms-python.python-2023.8.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/home/jhoffbauer/.vscode-server/extensions/ms-python.python-2023.8.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/home/jhoffbauer/.vscode-server/extensions/ms-python.python-2023.8.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/home/jhoffbauer/.vscode-server/extensions/ms-python.python-2023.8.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/home/jhoffbauer/.vscode-server/extensions/ms-python.python-2023.8.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "/home/jhoffbauer/project/rl_training.py", line 262, in <module>
    pipe_outputs = sentiment_pipe(texts, **sent_kwargs)
  File "/home/jhoffbauer/project/venv/lib/python3.10/site-packages/transformers/pipelines/text_classification.py", line 155, in __call__
    result = super().__call__(*args, **kwargs)
  File "/home/jhoffbauer/project/venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1099, in __call__
    outputs = list(final_iterator)
  File "/home/jhoffbauer/project/venv/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 124, in __next__
    item = next(self.iterator)
  File "/home/jhoffbauer/project/venv/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 125, in __next__
    processed = self.infer(item, **self.params)
  File "/home/jhoffbauer/project/venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1024, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
  File "/home/jhoffbauer/project/venv/lib/python3.10/site-packages/transformers/pipelines/text_classification.py", line 182, in _forward
    return self.model(**model_inputs)
  File "/home/jhoffbauer/project/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/jhoffbauer/project/venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 832, in forward
    raise ValueError("Cannot handle batch sizes > 1 if no padding token is defined.")
ValueError: Cannot handle batch sizes > 1 if no padding token is defined.
lvwerra commented 1 year ago

That makes sense since batches need to be padded. However, it would make sense to verify that the pad tokens are indeed ignored and you are padding on the correct side. E.g. running two samples separately and in a batch through the pipeline and checking the scores match would be good.