rail-berkeley / softlearning

Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.
https://sites.google.com/view/sac-and-applications
Other
1.2k stars 239 forks source link

Question on the soft q learning implementation #143

Open YuxuanSong opened 4 years ago

YuxuanSong commented 4 years ago

Hi Haarnoja,

Thanks a lot for maintaining the amazing repo! I feel a little confused about the implementation of SVGD in soft-q learning. At https://github.com/rail-berkeley/softlearning/blob/05daa5524ae1a76b70b8a8a29a0f5f824d401484/softlearning/algorithms/sql.py#L281 ,the log probs is calculated as log_probs = svgd_target_values + squash_correction,where is log probs on the $u$(raw_action) space. ($a$ = tanh($u$)) However, the following SVGD used the log probs on the $u$ space to get the updated directions of $a$, which seems to be not aligned.

I think there should be actions = self._policy.raw_actions(expanded_observations) in https://github.com/rail-berkeley/softlearning/blob/05daa5524ae1a76b70b8a8a29a0f5f824d401484/softlearning/algorithms/sql.py#L235. (the policy class could add this property.)

Best, Yuxuan

hartikainen commented 4 years ago

Hey @YuxuanSong, thanks for bringing this up! The SQL implementation in this repo was migrated from https://github.com/haarnoja/softqlearning and I have actually not tested it thoroughly. I'll try to take a closer look at this soon and make sure it's implemented properly.