Lab 4 Review by Borella Simone s317774

Hi Michelangelo, nice implementation of the Q-learning algorithm. I liked that you implemented several opponent strategies for testing your trained model, I don't understand why you don't use them also for training! You could reach better performances training your model not only with the random opponent strategy but also with more optimal strategies. Another point is the choice of a constant epsilon value. To encourage exploration a decreasing epsilon is recommended in my opinion to avoid leading into a suboptimal policy, due to the fact that your training agent tends to exploit the actual optimal policy, following the best action-value instead of making some random action for exploration. Anyway good job!

rasenqt / computational_intelligence23_24

Lab 4 Review by Borella Simone s317774 #8