Closed m031n closed 2 years ago
Just taking the mean will not work. The final output needs to be a valid probability distribution. Since you have continuous and discrete actions, it will be difficult ... a workaround would be to output a 3 dimensional normal distribution, and take one of the dimensions make it discrete using thresholds.
I try to use your code as my algorithm to solve a navigation problem. The problem that I can't handle for myself is how to get ratio in my problem.
my actions space has 3 parts: 1) linear velocity change in the range [-3,3], from a tanh in actor 2) angular velocity change in the range [-pi/12, pi/12] from a tanh in actor 3) step_time length is selected from a certain set (0.2, 0.5, 0.8) from a softmax
I try to get log_prob from each distribution, get exp and calculate the mean of these three non-log probabilities, but the result is bad.
Any suggestion for me?