jonathan-laurent / AlphaZero.jl

A generic, simple and fast implementation of Deepmind's AlphaZero algorithm.
https://jonathan-laurent.github.io/AlphaZero.jl/stable/
MIT License
1.23k stars 136 forks source link

Success report and request for help #201

Open bwanab opened 11 months ago

bwanab commented 11 months ago

As I mentioned in another issue, I've been working on training an AI Agent to play Othello/reversi. I wanted to report that I've had some pretty decent success using AlphaZero.jl. Much more than I was able to achieve with PYTorch, TensorFlow or Flux.jl. That's the good news. The not-so-good news is that while I've gotten a relatively good player, it's still not that great. It easily beats really bad players (like me) and can play 50/50 against a basic MinMax heuristic (translated from https://github.com/sadeqsheikhi/reversi_python_ai).

In my training, I've done around 25 iterations (the repository is here: https://git.sr.ht/~bwanab/AZ_Reversi.jl). The loss seems to have flatlined at around 10 iteration and very gradually slopes upward after that.

Are there any particular hyper-parameters that I should look at? One thing I tried that didn't seem to make much difference was making the net a little bigger by changing the number of blocks from 5 to 8.

bwanab commented 11 months ago

Replying to myself, but I've found that by increasing the timeout when creating the AlphaZeroPlayer, the level of play gets much better. For example, in the case I gave above of playing 50/50 against the MinMax heuristic, using a 5 second timeout instead of the default 2 seconds, raises the level to more like 80/20. At 10 seconds, MinMax can't beat it.

If anybody has insight into this I'd love to hear it.

jonathan-laurent commented 11 months ago

Thanks for reporting on your experience! Tuning AlphaZero can be pretty hard indeed. Can I see some of the automatically generated metrics and graphs in your experiment?

bwanab commented 11 months ago

benchmark_reward loss

These are the ones that seem to have the most information in them to me, but that might be my ignorance.