Adds a simple cart-pole balancing example based on Brax's InvertedPendulum.
While this is a standard RL baseline, it is a bit strange in several ways:
Reward is based on number of steps without falling over rather than distance to upright. So we need to change the rollouts a bit to respect the done flag.
Hard joint limits (and the above) make a full swingup not possible: this is just balancing
The distribution of initial conditions is very close to the upright
Also, it looks like data gen runs very slowly compared to our hand-crafted envs. That could be worth looking into in more detail.
Adds a simple cart-pole balancing example based on Brax's
InvertedPendulum
.While this is a standard RL baseline, it is a bit strange in several ways:
done
flag.Also, it looks like data gen runs very slowly compared to our hand-crafted envs. That could be worth looking into in more detail.