toshikwa / soft-actor-critic.pytorch

PyTorch implementation of Soft Actor-Critic(SAC).
MIT License
94 stars 22 forks source link

log_alpha v.s. alpha for entropy loss calculation #2

Open bluecontra opened 3 years ago

bluecontra commented 3 years ago

Hi,

Thank a lot for your helpful implementation!

After checking several SAC repos, I came into a question: why _logalpha is used (instead of alpha) for entropy calculation as in line 306 of /code/agent.py ?

I found _logalpha seems to be popular in several other SAC repos. However, it seems that alpha is used in line 256 of https://github.com/rail-berkeley/softlearning/blob/master/softlearning/algorithms/sac.py , as it in the original paper.

I also did some personal re-implementation of SAC, and I found both _logalpha and alpha work.

Do you have any idea about this?

anaselfathi commented 1 year ago

Hello, @bluecontra have you managed to get an answer to this mystery, I am also confused as to why most pytorch re-implementations online uses log_alpha but the original softlearning repos (tensorflow) seems to use alpha?

pmh5050 commented 8 months ago

Hello, @bluecontra have you managed to get an answer to this mystery, I am also confused as to why most pytorch re-implementations online uses log_alpha but the original softlearning repos (tensorflow) seems to use alpha?

Hello, I had a similar question like yours. I believe the reason for this is as follows:

If we optimize the alpha directly, the alpha could be a positive or negative value. However, alpha should be a non-negative value, and this is related to the definition of the entropy bonus term.

Entropy Bonus = alpha * entropy(H)

A negative alpha means that the agent is penalized by entropy.

The optimized log_alpha could be a negative or positive value, but exp(log_alpha) guarantees that the result is always a positive value. The range of exp(x) is (0, ∞).

This is why several SAC repositories prefer to optimize the log_alpha.