Open jpreiss opened 6 years ago
The Soft Actor-Critic paper (arXiv v2) says, in the last paragraph on page 5:
We then use the minimum of the Q-functions for the value gradient in Equation 6 and policy gradient in Equation 13
However, the code in sac/algos/sac.py uses only one of Q functions in the policy gradient loss. It does use the minimum in the value gradient loss.
sac/algos/sac.py
Is there a reason for the discrepancy? Thanks!
Good catch! We actually tried both versions and did not find much difference between them. We'll fix the code in the next release.
The Soft Actor-Critic paper (arXiv v2) says, in the last paragraph on page 5:
However, the code in
sac/algos/sac.py
uses only one of Q functions in the policy gradient loss. It does use the minimum in the value gradient loss.Is there a reason for the discrepancy? Thanks!