rail-berkeley / rlkit

Collection of reinforcement learning algorithms
MIT License
2.45k stars 550 forks source link

why used two q function in sac? #87

Closed KK666-AI closed 4 years ago

KK666-AI commented 4 years ago

Dear author,

I notice that in you implementation of sac, the target value is estimated by two q functions, why should be like this?

target_q_values = torch.min( self.target_qf1(next_obs, new_next_actions), self.target_qf2(next_obs, new_next_actions), ) - alpha * new_log_pi Does you want to avoid the overestimate of q-function like double q-learning?

vitchyr commented 4 years ago

Yes, exactly! See README for more details.