trietngo712 / CI2023

0 stars 0 forks source link

LAB_2 peer review Edoardo Vay #1

Open Edoxy opened 11 months ago

Edoxy commented 11 months ago

Hi, You did a great job spotting that the optimal strategy that was provided is in fact not optimal and you have written a better version of it, that, to my understanding, is able to win 100% of the time. I would have loved some comments or a more detailed documentation because some passages are very difficult to understand without any help from you: for example the mechanism with which you make an agent choose a strategy is very complicated to understand and you could use some comments to clarify it. You Implemented an evolution strategy but I would have loved being able to see your result with it; also I'm not very sure about the way you used the termephocs. Anyway I think you did a Great Job in this lab and I wish you good luck for the others.

trietngo712 commented 11 months ago

Hi, Thank you for a very thorough review. I will update the code to clarify all the steps involved in the agent functionality. But the basic idea is that an adaptive agent using a perceptron whose input is the state of the game and output is a pre-defined action. The weight is updated according to ES. The motivation behind is to avoid hand-crafted strategies and rely upon computing power to find the best strategy represented by the perceptron.

arturoadelfio commented 11 months ago

Hi, compliments for your lab. I think you already created a good code, and moreover you implemented an optimal strategy that, differently from the one given with the lab, won all the games. If I have to recommend something I would suggest trying a different approach for the Custom Agent, that seems to behave randomly for the first moves and optimally for the lasts. It could be better to have a fixed probability of generating moves with the strategies you want to adopt. (You could also include silly strategies to differentiate the possibility). In this way you should avoid that the adaptive agent learns the first moves based on the random strategies and compare its moves with the optimal only when it gets close to the end of the match. Furthermore, since the optimal strategies, and the winning probability depends also on who starts first, I would suggest checking the fitness of the adaptive agent letting him play half of the matches as the first and the others as the second.

Ultimately, it seems you have a fixed value for sigma. I think that your algorithm could benefit from tuning its value to have a self adaptive approach.

Hope this solution can be useful. Good luck for the next labs!!