Closed zhuyifengzju closed 6 years ago
Hi @zhuyifengzju,
I am facing the same issue. As far as a I know, simple examples of the Open AI should run in real time with no problem. However, they run slow and it becomes slower with time. Have you figured out any solution?
@alejodosr Yes, I found out the cause. In the update part, the sess.assign is not pre-defined. The consequence is that as the program runs, it will add more and more new operators. The solution is to define the operators of assignment in initialization part, and use sess.run() in the update part, then the problem of slowing down can be solved.
Thank you @zhuyifengzju. Is that related to actor, critic and target networks? I don't know if I correctly understood the changes you propose.. Could you post some code?
@alejodosr For example,
This is the part that makes the program slower.
def update_target_actor(self):
self.sess.run([
self.t_W1_a.assign(TAUself.W1_a+(1-TAU)self.t_W1_a),
self.t_B1_a.assign(TAUself.B1_a+(1-TAU)self.t_B1_a),
self.t_W2_a.assign(TAUself.W2_a+(1-TAU)self.t_W2_a),
self.t_B2_a.assign(TAUself.B2_a+(1-TAU)self.t_B2_a),
self.t_W3_a.assign(TAUself.W3_a+(1-TAU)self.t_W3_a),
self.t_B3_a.assign(TAUself.B3_a+(1-TAU)self.t_B3_a)])
What I have done is to define a operator in init:
self.update_op = [
self.t_W1_a.assign(tauself.W1_a + (1-tau)self.t_W1_a),
self.t_B1_a.assign(tauself.B1_a + (1-tau)self.t_B1_a),
self.t_W2_a.assign(tauself.W2_a + (1-tau)self.t_W2_a),
self.t_B2_a.assign(tauself.B2_a + (1-tau)self.t_B2_a),
self.t_W3_a.assign(tauself.W3_a + (1-tau)self.t_W3_a),
self.t_B3_a.assign(tauself.B3_a + (1-tau)self.t_B3_a)]
and then in the function update_target_actor, you can simply call: self.sess.run(self.update_op)
Hope it helps.
Thanks! @zhuyifengzju It worked like a charm, you've saved me a lot of time :)
By the way @zhuyifengzju , have you notice that despite of having action bounds for each environment, the action aren't bounded in the end?
action max: [1.0, 1.0] action min: [-1.0, -1.0] Number of States: 11 Number of Actions: 2 Number of Steps per episode: 50
And the actions of the agent:
Action at step 15 : [-0.02675752 0.72265756] Action at step 16 : [-0.68277169 0.72300326] Action at step 17 : [-0.35434261 1.10894008] Action at step 18 : [-0.35992786 1.4588139 ] Action at step 19 : [-0.06045577 1.63727846] Action at step 20 : [-0.50214302 1.42746616]
Any suggestion?
@alejodosr Sorry for my late reply. I think the reason is that when the code deal with boundary, it does not work like a strict constraint as we understand in mathematics. This is just my guess. If I am wrong about this, please point it out. Thank you!
Hello! I have run your code and there is a problem about it. It seems that the update part where tf.assign is used becomes slower as the code keeps running, and it becomes the bottleneck of running speed. I am wondering if you have come across with the same problem? If so, I am looking forward to the solution. Thanks a lot!