The code looks well structured and well written. The readme is clear and helped me understanding the code and your strategy.
I have just a few comments:
In the readme you claim that an optimal strategy should win against a random player 100% of the time when starting first. I do not agree because, statistically, it may happen that the random agent plays optimally and 5% of draws look reasonable.
Maybe a slightly different reward strategy could have helped you for agent starting second, especially for a faster convergence. I noted that you do not reward draws in both of your strategies, but since it is more difficult to win starting second also drawing is not that bad. And in general I think that it would have helped your agent avoiding losing and increasing the likelihood of winning.
Everything else looks great and I thank you for exhaustingly documenting the code and describingthe results and possible shortcomings.
Hi, this is my review.
The code looks well structured and well written. The readme is clear and helped me understanding the code and your strategy.
I have just a few comments:
Everything else looks great and I thank you for exhaustingly documenting the code and describingthe results and possible shortcomings.
Good luck for the exam :)