Results on Hopper - Githubissues

katiekang1998 commented 2 years ago

Hi, I am hoping to train a policy on the hopper environment in brax. I noticed that there were no results on hopper that were reported in the paper, and no default parameters for hopper in the brax training iPython notebook. I tried using both the SAC and PPO hyperparameters reported in the appendix of the paper to train hopper, but was unable to get returns above 500. Visually rendering the policy showed the agent hopping a few times and then following over. I am wondering if the authors have been able to get getting results on hopper, and if so, it would be very helpful if you could share the hyperparameters. Thanks!

milutter commented 2 years ago

Hi Katie, I got okayish to good results on Hopper using SAC. This learning curve is averaged over 10 seeds and the shaded area corresponds to the min/max reward. However, I dont have magic hyperparameters that make it work. I mainly changed 3 things:

Increasing the joint stiffness of the Hopper model. The default joint stiffness makes the Hopper behave more like pogo-stick instead of a rigid-body. Using the default observations (like the open-ai gym), the pogo-stick behavior should not be observable as the constraint violation is not contained in the observations. When increasing the joint stiffness, also the number of substeps must be increased.
Using an action repeat of 2. This brings the control freq to 25Hz which is more comparable to the DM control suite parametrization.
Reducing the number of parallel environments. Using fewer environments increases the policy entropy during the initial exploration and stabilized the alpha for me.

erwincoumans commented 2 years ago

That's nice @milutter, do you have a colab to share?

google / brax

Results on Hopper #129