Why is num_labels=1 in the reward_madeling.py script?

System Info

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder
[ ] My own task or dataset (give details below)

Reproduction

model = AutoModelForSequenceClassification.from_pretrained( model_config.model_name_or_path, num_labels=1, trust_remote_code=model_config.trust_remote_code, **model_kwargs )

I trained a reward model based on this script, but the output logits only have one element, which cannot be well used for subsequent PPO training

Expected behavior

。

huggingface / trl

Why is num_labels=1 in the reward_madeling.py script? #1993

System Info

Information

Tasks

Reproduction

Expected behavior