suragnair / alpha-zero-general

A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more
MIT License
3.74k stars 1.01k forks source link

Chess, possible moves, how would you experts go about it? #241

Open JernejHenigman opened 3 years ago

JernejHenigman commented 3 years ago

Decided to implement game of AntiChess (https://lichess.org/variant/antichess) into alpha-zero-general framework. Essentially I am not sure about one thing:

def getActionSize(self):

return number of actions

    return ? 

What should method getActionSize(self) return? You see in connect4, it is trivial to understand that there are always n possible moves at max (if all columns still have a room for stone), so in that case the return value of getActionSize methoud would be n, where n is size of n*n grid.

But what is the upper limit of possible moves in chess? (or AntiChess for that matter). In chess we have following pieces: pawn, knight, bishop, rook, queen and a king. Do we need to count every possible move for every piece from every possible square on the board that piece has a right to move to, to arrive to desired number for upper limit of getActionSize() method?

My thinking is the following: number_of_actions = PawnMoves + KnightMoves + BishopMoves + RookMoves + QueenMoves + KingMoves.

PawnMoves:

Next, do the same possible count for other pieces, sum all together and whatever sum we get this is our desired limit for possible moves in game of chess?

I saw that in the game of tafl for instance, this method return self.n**4, what is explanation behind that?

rlronan commented 3 years ago

The action size depends largely on how you define an action for your game programmatically. Typically, for games where you place pieces, people have defined actions by the location the piece is placed in. For games where you move pieces, people have defined an action by (a pair) where the piece was, and where it is moved to.

That's why for Connect4 there are n=boardwidth many actions,

for TicTactToe there are n**2 actions,

and for TAFL there are n**4 = (n**2 many board spaces for the piece to be in) x (n**2 many board spaces for the piece to be moved to).

If you look at TaflLogic: https://github.com/suragnair/alpha-zero-general/blob/f0f1106b93cb74364233a65dbb8d2b85c9c88608/tafl/TaflLogic.py#L88-L113

Or at SantoriniLogic: https://github.com/suragnair/alpha-zero-general/blob/f0f1106b93cb74364233a65dbb8d2b85c9c88608/santorini/SantoriniLogic.py#L157-L168

You'll see there's typically a lot of code that specifies which moves are actually legal given the board-state, and that information is used to mask illegal moves in the actions vector.

gms2009 commented 2 years ago

https://github.com/goshawk22/alpha-zero-chess seems use pip python-chess

coder-free commented 2 years ago

@rlronan Hi, did your training in Santorini yield any results? I want to train the Santorini. But I don't know is my hardware strong. I just have one rtx2080 GPU and i7 4 cores CPU.