Temperature coefficient not found in SAC

katerakelly / oyster

Implementation of Efficient Off-policy Meta-learning via Probabilistic Context Variables (PEARL)

MIT License

472 stars 125 forks source link

Thanks for your great work and code! I have a few confusions.

First,I notice that the temperature coefficient α not used during SAC training, it is not identical to the SAC algorithm, why?

Second, why policy_loss = policy_loss + policy_reg_loss? What do these terms mean ?
` mean_reg_loss = self.policy_mean_reg_weight * (policy_mean**2).mean()

    std_reg_loss = self.policy_std_reg_weight * (policy_log_std**2).mean()

    pre_tanh_value = policy_outputs[-1]

    pre_activation_reg_loss = self.policy_pre_activation_weight * (
        (pre_tanh_value**2).sum(dim=1).mean()
    )

    policy_reg_loss = mean_reg_loss + std_reg_loss + pre_activation_reg_loss

    policy_loss = policy_loss + policy_reg_loss`

Third,in rlkit.core.rl_algorithm.228&422, context= self.sample_context(self.task_idx) where is the function "sample_context" defined?

Finally, if we apply the temperature auto-adjustment trick of SAC to PEARL(arxiv1812.05905), would PEARL perform better?

katerakelly / oyster

Temperature coefficient not found in SAC #9