issues
search
hmosousa
/
temporal_game
0
stars
0
forks
source link
Create MCTS trained agent
#12
Closed
hmosousa
closed
1 month ago
hmosousa
commented
1 month ago
First implementation:
keep the reward ratio (reward / max reward) for each playout instead of the number of wins
use a random agent in the sampling process (light sampling)
First implementation: