keraJLi / rejax

Apache License 2.0
154 stars 7 forks source link

tanh transform instead of clipping of log std in SAC #4

Open keraJLi opened 9 months ago

keraJLi commented 9 months ago

See here. Interestingly, the log std is not clipped or otherwise bounded in PPO, but its also invariant to the state there. Maybe investiage differences. On a different note, SAC.action_dist should also be made private, since returning a distrax distribution from a jitted function does not work last time I checked.