Closed LabChameleon closed 4 months ago
Hi @LabChameleon , Looking at Figure 4, the environments are trained with 1e6 steps to 6-8k reward, comparing Brax v1 and MuJoCo. Brax has changed considerably since then and I expect that a similar analysis and comparison would need to be done to find an average return that "solves" the environment in the current Brax version (and also for each physics backend). We don't have a particular number lying around for this currently. @cdfreeman-google may have better insights here.
Hi!
is there an agreed threshold when the humanoid environment is considered to be solved? In the Brax paper, they refer to the environment as solved with an average return of about 12000 as can be seen in Figure 9. In the Brax training tutorial, a threshold of 13000 is stated as given in https://github.com/google/brax/blob/a89322496dcb07ac5a7e002c2e1d287c8c64b7dd/notebooks/training.ipynb#L261
Thanks!