openai / spinningup

An educational resource to help anyone learn deep reinforcement learning.
https://spinningup.openai.com/
MIT License
10.14k stars 2.23k forks source link

broken gradient propagation when do sampling #171

Closed huiwenzhang closed 5 years ago

huiwenzhang commented 5 years ago

Hi,

Just a little confusion, for the codes below: https://github.com/openai/spinningup/blob/2e0eff9bd019c317af908b72c056a33f14626602/spinup/algos/vpg/core.py#L71-L73 will the multinomial sampling operation break the gradient back propagation? We have re-parametrization trick for continuous distribution, but we have done nothing here. Besides, why don't we get the logp and logp_pi by indexing from the logp_all when given a and pi? Is it because index operation will also break gradient propagation?

Thanks

jachiam commented 5 years ago

We don't need to backpropagate through the action-sampling procedure for VPG, so no backprop is broken by this. We just need to know the log probability of the action that got sampled, which this provides. :)