Closed artcg closed 6 years ago
Hello,
I am looking at this variable in npo.py
old_dist_info_vars = dict(( k, ext.new_tensor( u'old_%s' % k, ndim=2 + is_recurrent, dtype=theano.config.floatX )) for k in dist.dist_info_keys)
Which later is used to calculate the KL divergence when doing an update
dist_info_vars = self.policy.dist_info_sym(obs_var, state_info_vars) kl = dist.kl_sym(old_dist_info_vars, dist_info_vars)
However I have looked all over and cant seem to find where that former tensor (e.g 'old_prob' in case of categorical distribution) gets its value set.
If anyone more familiar with the codebase could point me to it it would be greatly appreciated
Nevermind I found it, the tensor holder gets passed to the 'inputs' arg in update_policy in the optimizer, and then the value from optimize_policy in npo.py is part of the 'all_inputs_values' arg
Hello,
I am looking at this variable in npo.py
old_dist_info_vars = dict(( k, ext.new_tensor( u'old_%s' % k, ndim=2 + is_recurrent, dtype=theano.config.floatX )) for k in dist.dist_info_keys)
Which later is used to calculate the KL divergence when doing an update
dist_info_vars = self.policy.dist_info_sym(obs_var, state_info_vars) kl = dist.kl_sym(old_dist_info_vars, dist_info_vars)
However I have looked all over and cant seem to find where that former tensor (e.g 'old_prob' in case of categorical distribution) gets its value set.
If anyone more familiar with the codebase could point me to it it would be greatly appreciated