suragnair / alpha-zero-general

A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more
MIT License
3.91k stars 1.04k forks source link

Game with player getting double turns / moves #306

Closed R-N closed 1 year ago

R-N commented 1 year ago

Hello, I plan to adapt this for MOBA draft picking as 1v1 combinatoric game. However the pick sequence is not strictly alternating. At times, a team, or a player, may pick two heroes, hence getting double turns. It is in the sequence of 1A-2B-2A-2B-2A-1B, with A and B being the teams/players and 1 and 2 being the number of heroes being picked.

So, which methods do I need to look out for? So far I've got:

Is that all?

Or maybe I can just have NeuralNet execute my model twice and have both actions get the same v? How does that sound?

But this alpha zero implementation requires that actions are single array with the length of possible action, so I can't just have it return two actions. Maybe I can just sum them? No, it will be invalid action, both for pi in training samples (softmax) and for MCTS, right? It also won't be batchable. Or maybe I can have the network always output two policies and mask the second one if only one is needed.

R-N commented 1 year ago

Okay so I tried making the moves always double moves (with the second part being empty if it's a single move turn) and... it's a bad idea (lol). I thought 2 consecutive moves shouldn't be handled separately because if I know I'd be making two moves, I'll play differently than if I think I can only make one move. However the action search space increases exponentially, and with that other things too.

Drafting also doesn't fit well into this framework. With banning phase and picking phase, I'm having somewhat two boards instead of one, or a board with two distinct battle area and stage. It was quite hard trying to adapt it by only implementing Game and NeuralNet.

I guess I'll just do this from scratch. Although it claims to be general and flexible I guess it's still limited to 2 player adversarial board game with alternating turns.

cestpasphoto commented 1 year ago

The literature only uses AlphaZero for perfect board game (no unknown), with 2 players alternating turns.

But there are workarounds. With little work, you can at least remove the 2 players limitation, or the alternating turns one. On my fork, for games like Minvilles or Santorini with gods power (worst case is play until player wants to stop), I've had good results with:

This way, tree exploration in MCTS knows that one may do more than 1 move and can anticipate it. In MCTS tree, it needs several layers of nodes to represent the full turn of a single player, but that can be compensated by increasing number of MCTS simulations.

An alternative would be to code combinations of 2 actions into a single value, while still having a single game status and single NN. Advantage is a shorter tree (one full turn is represented by a single node instead of several) but instead of having actions space would become of length N*P instead of N+P. My feeling (not checked through an experiment) is that having too large space for action is not good for NN performance.

R-N commented 1 year ago

The literature only uses AlphaZero for perfect board game (no unknown), with 2 players alternating turns.

Yup. I thought by "general" it meant it went beyond that.

On my fork

I'll check it out when I have the time.

  • having a single game status (including 2 boards and/or info on previous moves if needed) and single NN

Yeah I did that.

  • as you said, getNextState() returning next_player and in MCTS, plus coding v for each player in a table instead of a single scalar and v = np.roll(v, next_player) instead of v = -v

I see. So that's all I need to change. Thanks

This way, tree exploration in MCTS knows that one may do more than 1 move and can anticipate it.

Anticipate it?

An alternative would be to code combinations of 2 actions into a single value, while still having a single game status and single NN.

I did that. Bad idea.

Advantage is a shorter tree (one full turn is represented by a single node instead of several) but instead of having actions space would become of length N*P instead of N+P.

Not much of an advantage in my case. The "board" is small (6+10), so my basic tree is wide (120) but shallow (16).

My feeling (not checked through an experiment) is that having too large space for action is not good for NN performance.

Yup. Search space increased exponentially, and so does training time. 120 possible moves to 14400+

R-N commented 1 year ago

On my fork

Oh it's not really a fork in github sense lol. No wonder I didn't notice your "fork" despite being recently updated. (I had looked into recent forks of this repo to see if there were useful unmerged commits)

cestpasphoto commented 1 year ago

This way, tree exploration in MCTS knows that one may do more than 1 move and can anticipate it.

I mean it can predict many actions ahead, even though each of them is represented by several moves.