Adaption for imperfect games

Hey guys!

Is anyone interested in adapting the alpha-zero method to imperfect games like Hearthstone? I started to work on this for some time now but only with minor success yet. I more or less started from https://github.com/sirmammingtonham/alphastone (unfortunately was neither runnable nor bugfree, but it's basically based on this project here) and have rewritten some code to achieve multiprocessing. In the near future I would like to integrate the concept of distributed computing, like Deepmind and other projects did. Preparations have been done like separating self-play, training and pitting. I also implemented a method to validate the neural network training using tensorboard to prevent overfitting by using early stopping. I also implemented some logging code to visualize games in something like Excel or power pivot via csv. I even improved MCTS to reflect multiple turns by the same player and realistic prediction of opponents behaviour in MCTS. I also implemented a much deeper ResNet and did some experiments on it.

We have some additional interesting challenges in that domain:

hidden information
much more complexity
uncertainty
multiple moves by one player before end turn
partially observable moves (like secrets)
information sets instead of game states
random events

Next steps I am planning to do:

switch from MCTS to information set MCTS (ISMCTS) to address hidden information and uncertainty (see: http://eprints.whiterose.ac.uk/75048/1/CowlingPowleyWhitehouse2012.pdf)
implement distributed computing
improve modeling of gamestates (like using one-hot encoding)
improve modeling of valid actions
switch from Fireplace to a more up-to-date and possibly faster game engine like Spellsource- visualize games via hsreplays or something similar
create a bot and make it possible to play against via some client
evaluate learning and generalization performance with different neural network depths in this domain
adapt to keras to improve readability and potentially performance

For the beginning, I reduced the complexity of Hearthstone by letting both players play the same hero with the same simple beginner deck. My random initial neural networks have no problems to beat a pure random player (must be because of MCTS), but after I find a better network with beats the initial one, it fails to beat the pure random player. So they seems to learn the wrong things which only help against other networks. I really like to find out if this generalized approach can be successful for such kind of games!

If someone is interested in participating or just discussing the new challenges, I'd really appreciate if you leave me a message, and I really appreciate any help on this topic! My (heavy work in progess) code is at: https://github.com/djdookie/alphastone/tree/master/alphabot so feel free to have a look. I am pretty new to Python so bear with me. ;)

Cheers!

suragnair / alpha-zero-general

Adaption for imperfect games #110