Closed PhanindraParashar closed 2 years ago
Gym automatically clips the input. For continuous control envs, it makes sense to use a tanh activation as long as the action space of the env is between 1 and -1 and we are decreasing the variance.
For custom environments most people clip the tensor.
I was going through ppo.py code.
Assuming I have 2 actions to predect for continuous actions, action_dims = 2 Let the standard deviation be initialized as .5 So variance is .25.
I found the fallowing, the mean of action is predected predected by the actor net
Say it is torch.tesor([.7,.9])
Then u use the variance to sample actions
If the sample draws actions which are over 1 or below -1 which is range of permitted action, what do we do?
Do we sample again?
Do we clip?
Is it a good idea to have a tanh activation on the sampled action? (But it will mess with the actor network)