haarnoja / sac

Soft Actor-Critic
Other
997 stars 233 forks source link

The comprehension of the policy limitations in SAC #31

Closed OrionZou closed 3 years ago

OrionZou commented 3 years ago

I very admire SAC you created. I have one guess about SAC's policy, and I would like to your confirm:

Is the comprehension correct that "it is difficult to code that the policy obeys Boltzmann distribution"? Which have been distributions with better performance than the Gaussian distribution? I want to ask for your opinion.

Looking forward to your reply!

OrionZou commented 3 years ago

Sorry, I see that you wrote GMM, hierarchical policy and latent_space_policy. I think I still need to spend some time to master them. They maybe solve my question.