vincekurtz / rddp

Reward-Driven Diffusion Policy
1 stars 0 forks source link

Enforce input limits #20

Closed vincekurtz closed 2 months ago

vincekurtz commented 2 months ago

Adds input limits (default [-1, 1]) to all systems via the OptimalControlProblem rollout mechanism.

This means that all environments will have input limits, but we don't need to directly modify each env. This brings us in closer alignment with Brax's PPO implementation, which samples actions from a normal-tanh distribution.