Add cart pole example - Githubissues

Adds a simple cart-pole balancing example based on Brax's InvertedPendulum.

While this is a standard RL baseline, it is a bit strange in several ways:

Reward is based on number of steps without falling over rather than distance to upright. So we need to change the rollouts a bit to respect the done flag.
Hard joint limits (and the above) make a full swingup not possible: this is just balancing
The distribution of initial conditions is very close to the upright

Also, it looks like data gen runs very slowly compared to our hand-crafted envs. That could be worth looking into in more detail.

vincekurtz / rddp