plkmo / AlphaZero_Connect4

PyTorch implementation of AlphaZero Connect from scratch (with results)
Apache License 2.0
82 stars 39 forks source link

The issue of despair in solvable games #2

Closed HaraldKorneliussen closed 4 years ago

HaraldKorneliussen commented 4 years ago

This looks like a wonderful project, but I see from your medium article that you may be running into a problem I ran into myself when I wrote my first connect4 playing program many years ago. Since the game is a first player win, and you can feasibly find out that even with traditional search methods on the first move, the second player doesn't know what to do. It will rightly conclude that it loses against good play no matter what it does, thus it doesn't matter what it does, so it plays random moves!! It's almost as if the algorithm was overcome with despair!

It looks like you may be getting this problem even with AlphaZero's much more sophisticated search.

The solution I implemented back then was to add a slight reward for losing late over losing early, to encourage the second player to at least postpone the inevitable as long as possible.

plkmo commented 4 years ago

I see, thanks for your comment. That may be the reason why I find that it learns quite slowly per iteration, and requires tons of games data to train reasonably as well.