Open bluecontra opened 3 years ago
Hello, @bluecontra have you managed to get an answer to this mystery, I am also confused as to why most pytorch re-implementations online uses log_alpha but the original softlearning repos (tensorflow) seems to use alpha?
Hello, @bluecontra have you managed to get an answer to this mystery, I am also confused as to why most pytorch re-implementations online uses log_alpha but the original softlearning repos (tensorflow) seems to use alpha?
Hello, I had a similar question like yours. I believe the reason for this is as follows:
If we optimize the alpha directly, the alpha could be a positive or negative value. However, alpha should be a non-negative value, and this is related to the definition of the entropy bonus term.
Entropy Bonus = alpha * entropy(H)
A negative alpha means that the agent is penalized by entropy.
The optimized log_alpha could be a negative or positive value, but exp(log_alpha) guarantees that the result is always a positive value. The range of exp(x) is (0, ∞).
This is why several SAC repositories prefer to optimize the log_alpha.
Hi,
Thank a lot for your helpful implementation!
After checking several SAC repos, I came into a question: why _logalpha is used (instead of alpha) for entropy calculation as in line 306 of /code/agent.py ?
I found _logalpha seems to be popular in several other SAC repos. However, it seems that alpha is used in line 256 of https://github.com/rail-berkeley/softlearning/blob/master/softlearning/algorithms/sac.py , as it in the original paper.
I also did some personal re-implementation of SAC, and I found both _logalpha and alpha work.
Do you have any idea about this?