haarnoja / softqlearning

Reinforcement Learning with Deep Energy-Based Policies
https://arxiv.org/abs/1702.08165
416 stars 94 forks source link

Not using target network for policy #4

Closed nosyndicate closed 6 years ago

nosyndicate commented 7 years ago

Hi, it seems to me the policy is not using a target network as stated in the paper Ihttps://github.com/haarnoja/softqlearning/blob/aca29d2aee66c44ee052a298f049a22fa14792a5/softqlearning/algos/softqlearning.py#L380

Am I miss something here?

haarnoja commented 7 years ago

Good catch! Indeed we don't have a target policy, that is a mistake in our paper. A target network is only needed to stabilize TD learning, and for learning the policy there is no such need. We'll fix this for the next version.