werner-duvaud / muzero-general

MuZero
https://github.com/werner-duvaud/muzero-general/wiki/MuZero-Documentation
MIT License
2.49k stars 611 forks source link

Performance evaluation during training #94

Closed DoxakisCh closed 3 years ago

DoxakisCh commented 3 years ago

Hi,

I have created a poker envirnoment and I want to train an agent with this implementation. In these occasions, the training proccess can take really long and also the training via self-play does not show any clear signs about the agent's performance. For these reasons, I was considering testing the agent at regular intervals against a random or a different trained agent and show the results in tensorboard for better monitoring. Because in your implementation the training process is continuous, is there way to apply this kind of evaluation?

Thank you in advance.

werner-duvaud commented 3 years ago

In the current version, you can already choose to display in real time the performance of muzero against a random agent, itself, or a hard coded expert agent with the opponent parameter. The results are displayed in tensorboard under the name 'MuZero reward' and 'opponent reward'.

If you want more help for adding custom games you can join the discord.