The issue of despair in solvable games

This looks like a wonderful project, but I see from your medium article that you may be running into a problem I ran into myself when I wrote my first connect4 playing program many years ago. Since the game is a first player win, and you can feasibly find out that even with traditional search methods on the first move, the second player doesn't know what to do. It will rightly conclude that it loses against good play no matter what it does, thus it doesn't matter what it does, so it plays random moves!! It's almost as if the algorithm was overcome with despair!

It looks like you may be getting this problem even with AlphaZero's much more sophisticated search.

The solution I implemented back then was to add a slight reward for losing late over losing early, to encourage the second player to at least postpone the inevitable as long as possible.

plkmo / AlphaZero_Connect4

The issue of despair in solvable games #2