Something very wrong with board games

schrum2 commented 7 years ago

Despite various error fixes, there is still some major problem with the board game code. I evolved an agent against the Othello WPC for 200 generations, and then did a post eval benchmark to uncover a resulting win percentage against the WPC of about 43%. Next, I started a new evolutionary run and stopped it after it produced the first champion. I then did a post eval on that champion vs the WPC and it won 48% of the time.

Something is simply wrong here ... even if the final result was not that great, the agents should at least improve over time if possible, or get no worse otherwise.

I'm at a bit of a loss as to what to do, but we can't produce any interesting results unless we figure something out.

schrum2 commented 7 years ago

I've decided to see how other researchers have dealt with randomness in games, so I revisited the HyperNEAT Checkers paper (http://www.aaai.org/Papers/AAAI/2008/AAAI08-100.pdf). This paper says that they evolved against a completely deterministic opponent (no randomness) but then evaluated the results against a slightly randomized agent. However, it had a more intelligent strategy than simply making completely random moves sometimes ... instead, the static agent (during post evaluation only) had a 10% chance of picking the second best move instead of the best move (according to its heuristic).

This is something we can try. Instead of using the minimaxRandomRate, make a new parameter minimaxSecondBestRate

schrum2 commented 7 years ago

minimaxSecondBestRate experiment is still running, and I need to do a post benchmark on it afterward (so far, fitness went up a little, but not much). However, I came across a paper with another approach to his problem.

In this paper (ftp://ftp.cs.utexas.edu/pub/techreports/tr02-32.pdf) the authors consider every possible game state that can be reached within 4 moves of the start, randomly pick 5 of these 244 states, and then play the game deterministically from that point onward. This approach could really help, since the current approach allows randomness in the end game, which is probably very sensitive to random moves.

schrum2 commented 7 years ago

Also worth note: the alpha beta depth in the Checkers HyperNEAT paper was 4

schrum2 commented 7 years ago

minimaxSecondBestRate result: During evolution, supposed win% of 80%. However, the post benchmark came out as 29%, which is a massive difference. I also ran postBestObjectiveEval.bat instead to see what result it gave, and the result was 48%. What is the reason for the discrepancy?

schrum2 commented 7 years ago

The difference between postBestObjectiveEval and postBoardGameBenchmarkBestOthelloWPCEval seems to be that postBoardGameBenchmarkBestOthelloWPCEval has a minimax search depth of 5, whereas postBestObjectiveEval will use the depth from the experiment, which was 2. I suppose this accounts for the discrepancy, though the fact that the result during evolution was so much higher (80%) is still a problem.

schrum2 commented 7 years ago

I evolved in a deterministic fashion with a search depth of 4 (both as in the HyperNEAT Checkers paper) and then did a post eval where players would randomly make the second-best move on occasion. During evolution, the score hit the ceiling of 100% win, but the post eval was only a 44% win rate.

I guess that technically, I was making both the evolved players and the static WPC use their second best move 10% of the time, whereas in the HyperNEAT Checkers paper, only the static opponent chose the second-best move on occasion. This is something to try, but it is still a discouraging result.

schrum2 commented 7 years ago

Still need to run the Othello-HNStaticWPCDeterministicD4Random4Moves batch file to test, but current results are not promising. However, who knows what the post eval will be like. Also, the depth 4 really slows it down.