1.Total reward/1.Total reward is illegal; using 1.Total_reward/1.Total_reward instead.

Binbose commented 3 years ago

When trying to train the model on lunar lander with the default config file, I get a bunch of error messages like:

INFO:root:Summary name 1.Total reward/3.Episode length is illegal; using 1.Total_reward/3.Episode_length instead.
INFO:root:Summary name 1.Total reward/4.MuZero reward is illegal; using 1.Total_reward/4.MuZero_reward instead.
INFO:root:Summary name 1.Total reward/5.Opponent reward is illegal; using 1.Total_reward/5.Opponent_reward instead.
INFO:root:Summary name 2.Workers/1.Self played games is illegal; using 2.Workers/1.Self_played_games instead.
INFO:root:Summary name 2.Workers/2.Training steps is illegal; using 2.Workers/2.Training_steps instead.
INFO:root:Summary name 2.Workers/3.Self played steps is illegal; using 2.Workers/3.Self_played_steps instead.
INFO:root:Summary name 2.Workers/4.Reanalysed games is illegal; using 2.Workers/4.Reanalysed_games instead.
INFO:root:Summary name 2.Workers/5.Training steps per self played step ratio is illegal; using 2.Workers/5.Training_steps_per_self_played_step_ratio instead.
INFO:root:Summary name 2.Workers/6.Learning rate is illegal; using 2.Workers/6.Learning_rate instead.
INFO:root:Summary name 3.Loss/1.Total weighted loss is illegal; using 3.Loss/1.Total_weighted_loss instead.
INFO:root:Summary name 3.Loss/Value loss is illegal; using 3.Loss/Value_loss instead.
INFO:root:Summary name 3.Loss/Reward loss is illegal; using 3.Loss/Reward_loss instead.
INFO:root:Summary name 3.Loss/Policy loss is illegal; using 3.Loss/Policy_loss instead.

however the model is training (without improving much though). What do these messages mean?

werner-duvaud commented 3 years ago

Hi, Can you please make sure to be using the latest python packages, especially the TensorboardX package ? If it persists, can you provide additional information about your system ?

Either way, adding an underscore to the tensoboard variable names should solve the problem.

dribnet commented 3 years ago

I am getting the same behaviour on this and other training attempts (eg: cartpole, connect4) and am using Tensorboard 2.4.0

werner-duvaud / muzero-general

1.Total reward/1.Total reward is illegal; using 1.Total_reward/1.Total_reward instead. #92