The hardcoded strategy rules that you have implemented are outstanding, nothing to adjust or suggest you. Nice choices!
Task 3.1 - Fixed rules based on nim-sum
The strategy is simple but straightforward. Also in this case, nothing to suggest you, good job.
Task 3.2 - Evolved rules
Your results are pretty similar to ours regarding pure_random, but that's different for optimal: if you always lose, you can try to modify some values. For instance, you can play with number of generations, population size and offspring size (check our code, our parameters are reported below):
More or less same thoughts as before, and I apprieciated your knowledge of python using the pickle library. Furthermore, you have reached impressive times of runtime: I think we're going to take inspirations to your cache implementation for the future ;)
Task 3.4 - Reinforcement Learning
Now let's move on the most interesting field of this lab. Your strategy is kind of well structured, but I strongly suggest you to consider the implementation of model-free Q-learning: it maintains memory of the previous games always modifying the weights, and not only memorizing the max value in a row as in your code.
If you check our code, you can notice that there is a Q-table based on state-action tuples as keys, sort of dicts of dicts. We have exploited the Bblais' Game library to handle the table, but you can also use shelve as the professor suggested.
Try to dedicate some time to test it, since it could be useful to your final project!
Peer Review by Giuseppe Atanasio (s300733)
Task 3.0 - Hardcoded
The hardcoded strategy rules that you have implemented are outstanding, nothing to adjust or suggest you. Nice choices!
Task 3.1 - Fixed rules based on nim-sum
The strategy is simple but straightforward. Also in this case, nothing to suggest you, good job.
Task 3.2 - Evolved rules
Your results are pretty similar to ours regarding
pure_random
, but that's different for optimal: if you always lose, you can try to modify some values. For instance, you can play with number of generations, population size and offspring size (check our code, our parameters are reported below):Task 3.3 - MinMax
More or less same thoughts as before, and I apprieciated your knowledge of python using the
pickle
library. Furthermore, you have reached impressive times of runtime: I think we're going to take inspirations to your cache implementation for the future ;)Task 3.4 - Reinforcement Learning
Now let's move on the most interesting field of this lab. Your strategy is kind of well structured, but I strongly suggest you to consider the implementation of model-free Q-learning: it maintains memory of the previous games always modifying the weights, and not only memorizing the max value in a row as in your code.
If you check our code, you can notice that there is a Q-table based on state-action tuples as keys, sort of dicts of dicts. We have exploited the Bblais' Game library to handle the table, but you can also use shelve as the professor suggested.
Try to dedicate some time to test it, since it could be useful to your final project!