vincekurtz / rddp

Reward-Driven Diffusion Policy
1 stars 0 forks source link

Add cart pole example #21

Closed vincekurtz closed 2 months ago

vincekurtz commented 2 months ago

Adds a simple cart-pole balancing example based on Brax's InvertedPendulum.

While this is a standard RL baseline, it is a bit strange in several ways:

Also, it looks like data gen runs very slowly compared to our hand-crafted envs. That could be worth looking into in more detail.