Adds input limits (default [-1, 1]) to all systems via the OptimalControlProblem rollout mechanism.
This means that all environments will have input limits, but we don't need to directly modify each env. This brings us in closer alignment with Brax's PPO implementation, which samples actions from a normal-tanh distribution.
Adds input limits (default [-1, 1]) to all systems via the
OptimalControlProblem
rollout mechanism.This means that all environments will have input limits, but we don't need to directly modify each env. This brings us in closer alignment with Brax's PPO implementation, which samples actions from a normal-tanh distribution.