schrum2 / MM-NEATv2

MM-NEAT version 2.0 is no longer supported. Please get MM-NEAT 3+ from https://github.com/schrum2/MM-NEAT
Other
11 stars 5 forks source link

Something very wrong with board games #455

Closed schrum2 closed 6 years ago

schrum2 commented 7 years ago

Despite various error fixes, there is still some major problem with the board game code. I evolved an agent against the Othello WPC for 200 generations, and then did a post eval benchmark to uncover a resulting win percentage against the WPC of about 43%. Next, I started a new evolutionary run and stopped it after it produced the first champion. I then did a post eval on that champion vs the WPC and it won 48% of the time.

Something is simply wrong here ... even if the final result was not that great, the agents should at least improve over time if possible, or get no worse otherwise.

I'm at a bit of a loss as to what to do, but we can't produce any interesting results unless we figure something out.

schrum2 commented 7 years ago

I've decided to see how other researchers have dealt with randomness in games, so I revisited the HyperNEAT Checkers paper (http://www.aaai.org/Papers/AAAI/2008/AAAI08-100.pdf). This paper says that they evolved against a completely deterministic opponent (no randomness) but then evaluated the results against a slightly randomized agent. However, it had a more intelligent strategy than simply making completely random moves sometimes ... instead, the static agent (during post evaluation only) had a 10% chance of picking the second best move instead of the best move (according to its heuristic).

This is something we can try. Instead of using the minimaxRandomRate, make a new parameter minimaxSecondBestRate

schrum2 commented 7 years ago

minimaxSecondBestRate experiment is still running, and I need to do a post benchmark on it afterward (so far, fitness went up a little, but not much). However, I came across a paper with another approach to his problem.

In this paper (ftp://ftp.cs.utexas.edu/pub/techreports/tr02-32.pdf) the authors consider every possible game state that can be reached within 4 moves of the start, randomly pick 5 of these 244 states, and then play the game deterministically from that point onward. This approach could really help, since the current approach allows randomness in the end game, which is probably very sensitive to random moves.

schrum2 commented 7 years ago

Also worth note: the alpha beta depth in the Checkers HyperNEAT paper was 4

schrum2 commented 7 years ago

minimaxSecondBestRate result: During evolution, supposed win% of 80%. However, the post benchmark came out as 29%, which is a massive difference. I also ran postBestObjectiveEval.bat instead to see what result it gave, and the result was 48%. What is the reason for the discrepancy?

schrum2 commented 7 years ago

The difference between postBestObjectiveEval and postBoardGameBenchmarkBestOthelloWPCEval seems to be that postBoardGameBenchmarkBestOthelloWPCEval has a minimax search depth of 5, whereas postBestObjectiveEval will use the depth from the experiment, which was 2. I suppose this accounts for the discrepancy, though the fact that the result during evolution was so much higher (80%) is still a problem.

schrum2 commented 7 years ago

I evolved in a deterministic fashion with a search depth of 4 (both as in the HyperNEAT Checkers paper) and then did a post eval where players would randomly make the second-best move on occasion. During evolution, the score hit the ceiling of 100% win, but the post eval was only a 44% win rate.

I guess that technically, I was making both the evolved players and the static WPC use their second best move 10% of the time, whereas in the HyperNEAT Checkers paper, only the static opponent chose the second-best move on occasion. This is something to try, but it is still a discouraging result.

schrum2 commented 7 years ago

Still need to run the Othello-HNStaticWPCDeterministicD4Random4Moves batch file to test, but current results are not promising. However, who knows what the post eval will be like. Also, the depth 4 really slows it down.

schrum2 commented 6 years ago

HNStaticWPCDeterministicD4Random4Moves: at 50 generations the fitness seems to have hit a ceiling winning about 80 to 90 percent, though I still think hitting 100 percent should be possible. However, when I actually do a post-eval against the WPC the win percent is about 40% across 100 games, which is really bad. When I play against as a human, the champion seems to do well sometimes and poorly others, though I generally manage to squeak through with a win. I may be imagining this, but it seemed as though play quality depended on whether playing as white or black, so I should look into this more.

Currently running new batch file, HNStaticWPCDeterministicD4Random2Moves, which should do better because of the decreased randomness. We'll see ...

Still, the main thing I need to do is track statistics distinguishing performance as black/white player.

schrum2 commented 6 years ago

Careful checking of eval logs revealed that post eval scores seem to be in line with performance. The reason that post-eval performance is bad is that agents that maximize average piece difference do not necessarily maximize win rate/score. So, I'm going to create a new batch file to evolve with both win score and piece differential.

schrum2 commented 6 years ago

From running Othello-HNStaticWPCDeterministicD2Random2MovesMO I have a result where a network that won 100% of the time during evolution only wins 80% during the post eval. This makes some sense, but indicates that more trials during evolution are needed, or some other, more robust, eval strategy.

schrum2 commented 6 years ago

Increasing the evals does indeed make post eval results more in line with results during evolution, but When I play against the computer as a human, I feel that I still win a lot. The evolved player isn't terrible, but I still think it could be better. The number of evals seems to be a major issue, so I'm going to go back to the random move scheme (not just opening random moves) but with an increased number of evals.

schrum2 commented 6 years ago

Went back and ran HNStaticWPC10Rand with more evals, but it wasn't good enough. During evolution, 70% win rate, and post eval with 50% win rate.

schrum2 commented 6 years ago

The HNStaticWPCDeterministicD2Random2MovesMOTerminalOverride batch file achieves 100% during evolution and 84% win rate in post evals. Of course, this depends on very little randomness, but I think it hints at a good idea: Evolve with no randomness until an agent wins 100%, then increase the randomness until someone hits 100% again, and keep going ... adding more and more randomness. Maybe this incremental scheme will succeed.

schrum2 commented 6 years ago

Note about the scheme above: although it gets good scores against the WPC in a fairly deterministic situation, it did very poorly against me, which indicates that some non-determinism is definitely needed.

schrum2 commented 6 years ago

When evolving with plain NEAT, the win rate during evolution gets as high as 90% but the post eval performance is less than 60%. This is using a random move rate.

For deterministic games with an increasing number of random opening moves and using HyperNEAT, the number of random openers stalls at 3. The plotted win rate is around 80%, but the post eval picks an agent with a win rate of only around 60% in the eval log, which makes me wonder if the win/draw/loss fitness is wonky. Maybe I should use win rate as well or instead ... at the very least, I would like to do a post eval on the agent with the best win rate to see if results are consistent.

schrum2 commented 6 years ago

I did a direct comparison of the contents of an eval log to the generation scores and they were not consistent. Definitely need to look at this more