The code is well-organized with clear comments, and the README enhances understanding of the adopted strategy, although additional statistics and results could be included for better clarity.
The Monte Carlo approach appears well-implemented, and I particularly appreciate the implementation of "clever_player." The game logic is properly structured, although it might be beneficial to implement a function to visualize the board during the games.
My only concern with "clever_player" is that, in this way, the player may consistently try to prevent the opponent from winning without attempting to win itself. To address this, you might consider incorporating an "ϵ-greedy" exploration strategy with a decay in ϵ after each step. Under this approach, there is a 1-ϵ probability of selecting a move using rl_move and a ϵ probability of choosing a random move. This allows the implementation of an exploration rate using "ϵ-greedy" with an initial random policy, gradually reducing the exploration rate over time."
Lab10 review
The code is well-organized with clear comments, and the README enhances understanding of the adopted strategy, although additional statistics and results could be included for better clarity.
The Monte Carlo approach appears well-implemented, and I particularly appreciate the implementation of "clever_player." The game logic is properly structured, although it might be beneficial to implement a function to visualize the board during the games.
My only concern with "clever_player" is that, in this way, the player may consistently try to prevent the opponent from winning without attempting to win itself. To address this, you might consider incorporating an "ϵ-greedy" exploration strategy with a decay in ϵ after each step. Under this approach, there is a 1-ϵ probability of selecting a move using rl_move and a ϵ probability of choosing a random move. This allows the implementation of an exploration rate using "ϵ-greedy" with an initial random policy, gradually reducing the exploration rate over time."