utiasDSL / safe-control-gym

PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and RL
https://www.dynsyslab.org/safe-robot-learning/
MIT License
636 stars 132 forks source link

Performance of the cartpole using PPO #154

Closed Hu-Hanyang closed 3 months ago

Hu-Hanyang commented 5 months ago

While testing the performance of the PPO controller on the cartpole task, I encountered an issue where the training does not seem to converge, despite using the provided parameters (some changes are as follows, these changes won't affect the training performance I think): In cartpole.yaml: change "info_in_reset: False" to "info_in_reset: True"; change "done_on_out_of_bound: True" to "done_on_out_of_bound: False". In ppo.yaml file: change "log_interval: 0" to "log_interval: 10"; change "tensorboard: False" to "tensorboard: True".

Could you please advise on the best performance that can be achieved with the PPO controller on the cartpole task based on your repository?

cartpole_ppo
Federico-PizarroBejarano commented 5 months ago

Hello @Hu-Hanyang, thanks for using our gym! I believe that changing done_on_out_of_bound might be affecting training, as it would not cancel a training episode even if the cartpole is very far from the typical operating point, but I am not 100% sure. If you switch that back on maybe the performance will converge again.

Additionally, we have very recently added hyperparameter optimization to the gym. I am not the expert on that (you can speak to @middleyuan if you have any questions), but you can see an example of optimally trained ppo here: https://github.com/utiasDSL/safe-control-gym/blob/main/examples/hpo/rl/ppo/config_overrides/cartpole/optimized_hyperparameters.yaml. This is of course optimal only for the cartpole setup in that same example.

Let me know if you have any further questions!

Hu-Hanyang commented 5 months ago

Thanks for your reply @Federico-PizarroBejarano ! Yeah you are right, the setting done_on_out_of_bound do have influence. I will try the optimal hyperparameters you recommend!

Federico-PizarroBejarano commented 5 months ago

Awesome, keep me updated!