vincekurtz / rddp

Reward-Driven Diffusion Policy
1 stars 0 forks source link

Use Brax backend #16

Closed vincekurtz closed 2 months ago

vincekurtz commented 2 months ago

Define systems and control objectives with Brax envs rather than the custom system/task breakdown we were using before.

This increases bloat, and makes it more awkward to implement terminal costs. But the bloat should be worth it because (1) this will smooth the way to using MJX and (2) it will allow us to baseline more easily against the RL algorithms in Brax as well as MBD.