pfnet / pfrl

PFRL: a PyTorch-based deep reinforcement learning library
MIT License
1.18k stars 158 forks source link

how to deal output of GaussianHeadWithStateIndependentCovariance #51

Closed kakuriyama closed 4 years ago

kakuriyama commented 4 years ago

Thank you for releasing quite powerful products.

I would like to use PPO on pfrl in case of Discrete Action. I have got vector like [ 1.1933773 -0.24673517 0.6604848 -1.5786057 0.8695493 ] from agent.act mothod. As it doesn't seem like probability, how can I decide a action with this vector ?

I use below model of PPO sample but I don't understand GaussianHeadWithStateIndependentCovariance.

policy = torch.nn.Sequential( nn.Linear(obs_size, 64), nn.Tanh(), nn.Linear(64, 64), nn.Tanh(), nn.Linear(64, action_size), pfrl.policies.GaussianHeadWithStateIndependentCovariance( action_size=action_size, var_type="diagonal", var_func=lambda x: torch.exp(2 * x), # Parameterize log std var_param_init=0, # log std = 0 => std = 1 ), )

Thank you.

tkelestemur commented 4 years ago

I think you need to use Softmax normalization after the last layer of the policy network. Take a look at the example of PPO for atari: https://github.com/pfnet/pfrl/blob/master/examples/atari/train_ppo_ale.py#L260

kakuriyama commented 4 years ago

Thank you for prompt response. I can understand to use Softmax normalization. I'll try it.

The usage of train_ppo_ale.py#L260 is better or more normal than below usage ? https://github.com/pfnet/pfrl/blob/master/examples/mujoco/reproduction/ppo/train_ppo.py#L160

muupan commented 4 years ago

It is not that one is better than the other. For discrete-action problems, categorical distribution using softmax is a popular choice for policy representation. For continuous-action problems like MuJoCo tasks, Gaussian distribution is a popular choice.

kakuriyama commented 4 years ago

Thank you, I understood your explanation. I'm clear in case of Discrete Action but may I know how to treat output of GaussianHeadWithStateIndependentCovariance to decide Continous Action ?

muupan commented 4 years ago

You can find what is the output of GaussianHeadWithStateIndependentCovariance at https://github.com/pfnet/pfrl/blob/master/pfrl/policies/gaussian_policy.py#L53. It is an instance of torch.distributions.Distribution. You can use .sample method of it to sample vector-valued actions from the distribution.

kakuriyama commented 4 years ago

I'm clear for all my questions. Thank you very much.