Why is reward model training not logged to W&B?

tlc4418 / llm_optimization

A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.

MIT License

25 stars 1 forks source link

Yeah I think during the refactor I removed this to simplify it for new users (so they can set up their own logging or log the way they want.

Feel free to set this back to true and set a wandb_entity in the RM training here. I think you will also need to add the following lines of code before creating the Trainer:

wandb.init(
    project="reward-model",
    entity=training_conf.wandb_entity,
    resume=training_conf.resume_from_checkpoint,
    name=f"{training_conf.model_name}-{training_conf.log_dir}-rm",
    config=training_conf,
)

I will try to add this back soon as an option in a commit, so it is enough to just change the flag and add an entity.

tlc4418 / llm_optimization

Why is reward model training not logged to W&B? #6