Closed kakuriyama closed 4 years ago
I think you need to use Softmax normalization after the last layer of the policy network. Take a look at the example of PPO for atari: https://github.com/pfnet/pfrl/blob/master/examples/atari/train_ppo_ale.py#L260
Thank you for prompt response. I can understand to use Softmax normalization. I'll try it.
The usage of train_ppo_ale.py#L260 is better or more normal than below usage ? https://github.com/pfnet/pfrl/blob/master/examples/mujoco/reproduction/ppo/train_ppo.py#L160
It is not that one is better than the other. For discrete-action problems, categorical distribution using softmax is a popular choice for policy representation. For continuous-action problems like MuJoCo tasks, Gaussian distribution is a popular choice.
Thank you, I understood your explanation. I'm clear in case of Discrete Action but may I know how to treat output of GaussianHeadWithStateIndependentCovariance to decide Continous Action ?
You can find what is the output of GaussianHeadWithStateIndependentCovariance
at https://github.com/pfnet/pfrl/blob/master/pfrl/policies/gaussian_policy.py#L53. It is an instance of torch.distributions.Distribution
. You can use .sample
method of it to sample vector-valued actions from the distribution.
I'm clear for all my questions. Thank you very much.
Thank you for releasing quite powerful products.
I would like to use PPO on pfrl in case of Discrete Action. I have got vector like [ 1.1933773 -0.24673517 0.6604848 -1.5786057 0.8695493 ] from agent.act mothod. As it doesn't seem like probability, how can I decide a action with this vector ?
I use below model of PPO sample but I don't understand GaussianHeadWithStateIndependentCovariance.
policy = torch.nn.Sequential( nn.Linear(obs_size, 64), nn.Tanh(), nn.Linear(64, 64), nn.Tanh(), nn.Linear(64, action_size), pfrl.policies.GaussianHeadWithStateIndependentCovariance( action_size=action_size, var_type="diagonal", var_func=lambda x: torch.exp(2 * x), # Parameterize log std var_param_init=0, # log std = 0 => std = 1 ), )
Thank you.