I am looking at your implementation of the PPO model.
After going through the code a couple of times I think in the implementation, although you created two policy instances, because of the re-use parameter is passed in the second instance, you effectively have the two identical policies in your model.
Furthermore I have not seen code that is used to transfer the weights between two policies, unlike OpenAI's implementation, in which they did this:
'''Python
assign_old_eq_new = U.function([],[], updates=[tf.assign(oldv, newv)
for (oldv, newv) in zipsame(oldpi.get_variables(), pi.get_variables())])
'''
Therefore could you please confirm this is indeed the case. Thanks!
Hi Simon
I am looking at your implementation of the PPO model.
After going through the code a couple of times I think in the implementation, although you created two policy instances, because of the re-use parameter is passed in the second instance, you effectively have the two identical policies in your model.
Furthermore I have not seen code that is used to transfer the weights between two policies, unlike OpenAI's implementation, in which they did this: '''Python assign_old_eq_new = U.function([],[], updates=[tf.assign(oldv, newv) for (oldv, newv) in zipsame(oldpi.get_variables(), pi.get_variables())]) '''
Therefore could you please confirm this is indeed the case. Thanks!