Closed mws262 closed 5 years ago
UCB rollout policy takes random actions. It would be cool to use the sort-of-bad-but-better-than-random neural network controller as the rollout controller. At very least, anything better than random.
This is now possible with the RolloutPolicy abstract class. I'm keeping this open since this is still a focus.
Good enough.
UCB rollout policy takes random actions. It would be cool to use the sort-of-bad-but-better-than-random neural network controller as the rollout controller. At very least, anything better than random.