I use PPO to make the car automatically find the way and avoid obstacles,but it didn't perform well. Similar examples use dqn network. Why can dqn but PPO not?
I have the same question. The basic PPO (tutorial_PPO) can only arrive the goal when there are no obstacles. Moreover, why is variable "logstd" in line 91 of tutorial_PPO always zero when running?
I use PPO to make the car automatically find the way and avoid obstacles,but it didn't perform well. Similar examples use dqn network. Why can dqn but PPO not?