Closed Elijas closed 4 years ago
Hi,
The loss in reinforcement learning is quite different from classical machine learning. So I don't see anything alarming with the losses above.
As for the hyper parameters to tune, you can take inspirations from this article and also see MuZero paper if you want to understand the influence of each parameters and to have an idea of how long it will take to train a Connect4 agent with you computational resources (see the medium article cited).
To measure the performance of your agent on Connect4, you can visualize the graphics named MuZero reward and Opponent reward which simulate games between MuZero and a deterministic agent. There are two possible choices for the Connect4 opponent currently, a random agent or an "expert" (the expert will take a win or block the opponent if there are three pieces aligned) but you can implement your own agent to measures the performance of MuZero !
Thanks!
Training the model for 10h (RTX6000) on Connect4.
Is it ok that only the policy loss goes down over time, while others go up? If I understand correctly, lowering the learning rate might help? What other hyperparameters would be useful? What would be a quicker way to select hyperparameters?
P.S. Another related question, is whether maybe there's some idea how long it would take to get somewhat good? The performance is not good at the moment. Maybe I should embed a performance test (say 20 matches against a normal algorithmic opponent such as negamax) every several epochs or so?