Inconsistency in solved return for Humanoid environment

google / brax

Massively parallel rigidbody physics simulation on accelerator hardware.

Apache License 2.0

2.14k stars 234 forks source link

Inconsistency in solved return for Humanoid environment #439

Closed LabChameleon closed 4 months ago

LabChameleon commented 6 months ago

Hi!

is there an agreed threshold when the humanoid environment is considered to be solved? In the Brax paper, they refer to the environment as solved with an average return of about 12000 as can be seen in Figure 9. In the Brax training tutorial, a threshold of 13000 is stated as given in https://github.com/google/brax/blob/a89322496dcb07ac5a7e002c2e1d287c8c64b7dd/notebooks/training.ipynb#L261

Thanks!

btaba commented 4 months ago

Hi @LabChameleon , Looking at Figure 4, the environments are trained with 1e6 steps to 6-8k reward, comparing Brax v1 and MuJoCo. Brax has changed considerably since then and I expect that a similar analysis and comparison would need to be done to find an average return that "solves" the environment in the current Brax version (and also for each physics backend). We don't have a particular number lying around for this currently. @cdfreeman-google may have better insights here.