nikhilbarhate99 / PPO-PyTorch

Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch
MIT License
1.57k stars 332 forks source link

how can I use this code for a problem with 3 different actions? #49

Closed m031n closed 2 years ago

m031n commented 2 years ago

I try to use your code as my algorithm to solve a navigation problem. The problem that I can't handle for myself is how to get ratio in my problem.

my actions space has 3 parts: 1) linear velocity change in the range [-3,3], from a tanh in actor 2) angular velocity change in the range [-pi/12, pi/12] from a tanh in actor 3) step_time length is selected from a certain set (0.2, 0.5, 0.8) from a softmax

I try to get log_prob from each distribution, get exp and calculate the mean of these three non-log probabilities, but the result is bad.

Any suggestion for me?

nikhilbarhate99 commented 2 years ago

Just taking the mean will not work. The final output needs to be a valid probability distribution. Since you have continuous and discrete actions, it will be difficult ... a workaround would be to output a 3 dimensional normal distribution, and take one of the dimensions make it discrete using thresholds.