See here. Interestingly, the log std is not clipped or otherwise bounded in PPO, but its also invariant to the state there. Maybe investiage differences.
On a different note, SAC.action_dist should also be made private, since returning a distrax distribution from a jitted function does not work last time I checked.
See here. Interestingly, the log std is not clipped or otherwise bounded in PPO, but its also invariant to the state there. Maybe investiage differences. On a different note,
SAC.action_dist
should also be made private, since returning a distrax distribution from a jitted function does not work last time I checked.