Create MCTS trained agent

hmosousa / temporal_game

0 stars 0 forks source link

Closed hmosousa closed 1 month ago

hmosousa commented 1 month ago

First implementation:

keep the reward ratio (reward / max reward) for each playout instead of the number of wins
use a random agent in the sampling process (light sampling)