paper/code conflict: using minimum Q in policy gradient

haarnoja / sac

Soft Actor-Critic

Other

1.01k stars 235 forks source link

Open jpreiss opened 6 years ago

jpreiss commented 6 years ago

The Soft Actor-Critic paper (arXiv v2) says, in the last paragraph on page 5:

We then use the minimum of the Q-functions for the value gradient in Equation 6 and policy gradient in Equation 13

However, the code in sac/algos/sac.py uses only one of Q functions in the policy gradient loss. It does use the minimum in the value gradient loss.

Is there a reason for the discrepancy? Thanks!

haarnoja commented 6 years ago

Good catch! We actually tried both versions and did not find much difference between them. We'll fix the code in the next release.