tanh transform instead of clipping of log std in SAC

See here. Interestingly, the log std is not clipped or otherwise bounded in PPO, but its also invariant to the state there. Maybe investiage differences. On a different note, SAC.action_dist should also be made private, since returning a distrax distribution from a jitted function does not work last time I checked.

keraJLi / rejax

tanh transform instead of clipping of log std in SAC #4