Performance of the cartpole using PPO

Hu-Hanyang commented 1 month ago

While testing the performance of the PPO controller on the cartpole task, I encountered an issue where the training does not seem to converge, despite using the provided parameters (some changes are as follows, these changes won't affect the training performance I think): In cartpole.yaml: change "info_in_reset: False" to "info_in_reset: True"; change "done_on_out_of_bound: True" to "done_on_out_of_bound: False". In ppo.yaml file: change "log_interval: 0" to "log_interval: 10"; change "tensorboard: False" to "tensorboard: True".

Could you please advise on the best performance that can be achieved with the PPO controller on the cartpole task based on your repository?

Federico-PizarroBejarano commented 1 month ago

Hello @Hu-Hanyang, thanks for using our gym! I believe that changing done_on_out_of_bound might be affecting training, as it would not cancel a training episode even if the cartpole is very far from the typical operating point, but I am not 100% sure. If you switch that back on maybe the performance will converge again.

Additionally, we have very recently added hyperparameter optimization to the gym. I am not the expert on that (you can speak to @middleyuan if you have any questions), but you can see an example of optimally trained ppo here: https://github.com/utiasDSL/safe-control-gym/blob/main/examples/hpo/rl/ppo/config_overrides/cartpole/optimized_hyperparameters.yaml. This is of course optimal only for the cartpole setup in that same example.

Let me know if you have any further questions!

Hu-Hanyang commented 1 month ago

Thanks for your reply @Federico-PizarroBejarano ! Yeah you are right, the setting done_on_out_of_bound do have influence. I will try the optimal hyperparameters you recommend!

Federico-PizarroBejarano commented 1 month ago

Awesome, keep me updated!

utiasDSL / safe-control-gym

Performance of the cartpole using PPO #154