Just a little confusion, for the codes below:
https://github.com/openai/spinningup/blob/2e0eff9bd019c317af908b72c056a33f14626602/spinup/algos/vpg/core.py#L71-L73
will the multinomial sampling operation break the gradient back propagation? We have re-parametrization trick for continuous distribution, but we have done nothing here.
Besides, why don't we get the logp and logp_pi by indexing from the logp_all when given a and pi? Is it because index operation will also break gradient propagation?
We don't need to backpropagate through the action-sampling procedure for VPG, so no backprop is broken by this. We just need to know the log probability of the action that got sampled, which this provides. :)
Hi,
Just a little confusion, for the codes below: https://github.com/openai/spinningup/blob/2e0eff9bd019c317af908b72c056a33f14626602/spinup/algos/vpg/core.py#L71-L73 will the multinomial sampling operation break the gradient back propagation? We have re-parametrization trick for continuous distribution, but we have done nothing here. Besides, why don't we get the logp and logp_pi by indexing from the logp_all when given a and pi? Is it because index operation will also break gradient propagation?
Thanks