Closed lujiayou123 closed 4 years ago
Hi, sorry for such a late reply - this issue slipped by me when it first came up!
Questions 1, 2, and 4 relate to automatic entropy tuning. I tried auto-entropy on the benchmark continuous control tasks with PEARL and did not observe an improvement so I did not merge it to master
. However, it might help in other tasks and would better align PEARL with the latest SAC, so I plan to clean this up and merge it soon.
Question 3 - that method is defined in sac.py
. This is incorrect use of abstraction, but at this point I think it's just going to stay that way.
Thanks for your great work and code! I have a few confusions.
First,I notice that the temperature coefficient α not used during SAC training, it is not identical to the SAC algorithm, why?
Second, why policy_loss = policy_loss + policy_reg_loss? What do these terms mean ?
` mean_reg_loss = self.policy_mean_reg_weight * (policy_mean**2).mean()
Third,in rlkit.core.rl_algorithm.228&422,
context= self.sample_context(self.task_idx)
where is the function "sample_context" defined?Finally, if we apply the temperature auto-adjustment trick of SAC to PEARL(arxiv1812.05905), would PEARL perform better?