The old and the new model is effectively the same?

Hi Simon

I am looking at your implementation of the PPO model.

After going through the code a couple of times I think in the implementation, although you created two policy instances, because of the re-use parameter is passed in the second instance, you effectively have the two identical policies in your model.

Furthermore I have not seen code that is used to transfer the weights between two policies, unlike OpenAI's implementation, in which they did this: '''Python assign_old_eq_new = U.function([],[], updates=[tf.assign(oldv, newv) for (oldv, newv) in zipsame(oldpi.get_variables(), pi.get_variables())]) '''

Therefore could you please confirm this is indeed the case. Thanks!

simoninithomas / Deep_reinforcement_learning_Course

The old and the new model is effectively the same? #60