junxiaosong / AlphaZero_Gomoku

An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)
MIT License
3.27k stars 964 forks source link

如果是可以走棋的游戏 action网络应该怎样设计? #21

Closed fupip closed 6 years ago

fupip commented 6 years ago

围棋和五子棋都是放下后不可移动,所以action和evaluation共用了一部分网络 如果是象棋跳棋类型,应该怎么设计这个action网络部分呢? 能否提供一点思路?

BIGBALLON commented 6 years ago

象棋的话参考第三篇paper "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm." (2017). [arXiv:1712.01815]

将棋 和 象棋 基本上都是暴力展开(稍微有用一点trick减少一点dim) paper后面有很详细的说明用了那些

fupip commented 6 years ago

好的,谢谢。我 去看一下。