Closed HeegerGao closed 4 years ago
Yes, you could learn the distribution std with the action means but, it is not a necessary condition. While learning action std, std could collapse prematurely and hence the agent will not explore the environment properly. There are multiple ways of implementing deep RL algorithms, you are free choose any implementation that works for your problem (At least as of now there is no standard procedure to follow).
Yes, you could learn the distribution std with the action means but, it is not a necessary condition. While learning action std, std could collapse prematurely and hence the agent will not explore the environment properly. There are multiple ways of implementing deep RL algorithms, you are free choose any implementation that works for your problem (At least as of now there is no standard procedure to follow).
OK, I see. Thank you very much.
Hello,
I am new to reinforcement learning. I find you set 'action_std' as a constant hyper-parameter in PPO_continuous.py, and only the 'action_mean' can be learned in the code. I don't know if it's a common operation in continuous action space problems or it's just in your method, as I think the 'action_std' should also be learned in the process. Can you give me some references to explain why you write it like this? Thank you very much!