tlc4418 / llm_optimization

A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.
https://arxiv.org/abs/2310.02743
MIT License
25 stars 1 forks source link

Why is reward model training not logged to W&B? #6

Open RylanSchaeffer opened 1 month ago

RylanSchaeffer commented 1 month ago

I'm now training reward models using your code and I discovered that reward models are not logged during their creation.

https://github.com/tlc4418/llm_optimization/blob/main/configs/config_rm.yaml#L37

Why is this?

tlc4418 commented 1 month ago

Yeah I think during the refactor I removed this to simplify it for new users (so they can set up their own logging or log the way they want.

Feel free to set this back to true and set a wandb_entity in the RM training here. I think you will also need to add the following lines of code before creating the Trainer:

wandb.init(
    project="reward-model",
    entity=training_conf.wandb_entity,
    resume=training_conf.resume_from_checkpoint,
    name=f"{training_conf.model_name}-{training_conf.log_dir}-rm",
    config=training_conf,
)

I will try to add this back soon as an option in a commit, so it is enough to just change the flag and add an entity.