Closed zhan0903 closed 4 years ago
Hi, why SAC use pi = policy.rsample() while VPG use pi = policy.sample()? Thanks.
pi = policy.rsample()
pi = policy.sample()
@zhan0903 SAC uses the reprametrization trick to optimize the policy therefore we need to use rsample to be able to back-prop through it.
rsample
DDPG is also doing this, albeit explicitly. Hope that helps!
Hi, why SAC use
pi = policy.rsample()
while VPG usepi = policy.sample()
? Thanks.