Closed xuhuazhe closed 6 years ago
The training algorithm is not meant to run optimally. It selects a random action with probability random_eps
and adds some noise to the action, noise_eps
.
These perturbations help training by helping the agent explore, but generally prevent the training success rate from ever reaching 100 percent. That is why the code includes an evaluation rollout worker in which both of these values are set to 0 so that the agent can achieve much higher success rate without random actions penalizing it.
Thanks a lot!
May I ask why the train success rate for FetchPickAndPlace-v1 is only 0.5 while test success rate is very high?