Why is the value function not used in this implementation?

Steven-Ho commented 4 years ago

I noticed two Q-functions and one V-function were presented in the SAC algorithm in the original paper. And so it is in the author's implementation (see https://github.com/haarnoja/sac/blob/master/sac/algos/diayn.py). Is there some consideration on computation costs or training stability?

twni2016 commented 4 years ago

Same question here. Will not using it affect the performance?

pranz24 commented 4 years ago

Yup, the paper (Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor) does use a value function. (I made another branch for this paper - https://github.com/pranz24/pytorch-soft-actor-critic/tree/SAC_V).

I don't think there is much of a difference in terms of training stability, although I haven't tested this thoroughly. (Not the exact same question but the author does mention here that they didn't see any noticeable difference by removing the value function - https://github.com/rail-berkeley/softlearning/issues/30)

In terms of computational cost, adding a value function is an overhead.

twni2016 commented 4 years ago

Thanks for the explanation!

Steven-Ho commented 4 years ago

@pranz24 thanks! I tried one run on Halfcheetah-v2 to see the difference (I add a value function and a target network, and canceled the target networks of Q function). It seems to take more episodes to learn. Maybe it's harder to learn one extra network. Screenshot from 2019-11-27 18-42-11

Your idea is great!

pranz24 / pytorch-soft-actor-critic

Why is the value function not used in this implementation? #22