kashif / firedup

Clone of OpenAI's Spinning Up in PyTorch
MIT License
146 stars 25 forks source link

Confusion: Why are you adding 1 here? #4

Closed AdityaGudimella closed 5 years ago

AdityaGudimella commented 5 years ago

When calculating the scaled log_std in SAC policy, you scale log_std + 1 to the range [LOG_STD_MIN, LOG_STD_MAX]. Is this because the range of the tanh function is [-1, 1]? Is it really necessary? Wouldn't the scaling limit the output range to [LOG_STD_MIN, LOG_STD_MAX] even without that?

https://github.com/chutaklee/firedup/blob/ed3634525703f3169b190f6e7951d69c38a5372d/fireup/algos/sac/core.py#L92-L93

kashif commented 5 years ago

yes so in the end i want a linear transformation which maps(-1, 1) to (LOG_STD_MIN, LOG_STD_MAX) and since this transformation is unique any other form of it will turn out to be the same formula as I have...

The way I, and the original author I suppose, thought of it was to scale (-1, 1) -+1 -> (0, 2) - *.5 -> (0, 1) -> stretch it -> (LOG_STD_MIN, LOG_STD_MAX). Hope that helps!