how can I use this code for a problem with 3 different actions?

nikhilbarhate99 / PPO-PyTorch

Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch

MIT License

1.57k stars 332 forks source link

I try to use your code as my algorithm to solve a navigation problem. The problem that I can't handle for myself is how to get ratio in my problem.

my actions space has 3 parts: 1) linear velocity change in the range [-3,3], from a tanh in actor 2) angular velocity change in the range [-pi/12, pi/12] from a tanh in actor 3) step_time length is selected from a certain set (0.2, 0.5, 0.8) from a softmax

I try to get log_prob from each distribution, get exp and calculate the mean of these three non-log probabilities, but the result is bad.

Any suggestion for me?

nikhilbarhate99 / PPO-PyTorch

how can I use this code for a problem with 3 different actions? #49