Closed Hu-Hanyang closed 3 months ago
Hello @Hu-Hanyang, thanks for using our gym! I believe that changing done_on_out_of_bound
might be affecting training, as it would not cancel a training episode even if the cartpole is very far from the typical operating point, but I am not 100% sure. If you switch that back on maybe the performance will converge again.
Additionally, we have very recently added hyperparameter optimization to the gym. I am not the expert on that (you can speak to @middleyuan if you have any questions), but you can see an example of optimally trained ppo here: https://github.com/utiasDSL/safe-control-gym/blob/main/examples/hpo/rl/ppo/config_overrides/cartpole/optimized_hyperparameters.yaml. This is of course optimal only for the cartpole setup in that same example.
Let me know if you have any further questions!
Thanks for your reply @Federico-PizarroBejarano ! Yeah you are right, the setting done_on_out_of_bound
do have influence. I will try the optimal hyperparameters you recommend!
Awesome, keep me updated!
While testing the performance of the PPO controller on the cartpole task, I encountered an issue where the training does not seem to converge, despite using the provided parameters (some changes are as follows, these changes won't affect the training performance I think): In cartpole.yaml: change "info_in_reset: False" to "info_in_reset: True"; change "done_on_out_of_bound: True" to "done_on_out_of_bound: False". In ppo.yaml file: change "log_interval: 0" to "log_interval: 10"; change "tensorboard: False" to "tensorboard: True".
Could you please advise on the best performance that can be achieved with the PPO controller on the cartpole task based on your repository?