stevenpjg / ddpg-aigym

Continuous control with deep reinforcement learning - Deep Deterministic Policy Gradient (DDPG) algorithm implemented in OpenAI Gym environments
MIT License
272 stars 74 forks source link

A question of running speed about your code #13

Closed zhuyifengzju closed 6 years ago

zhuyifengzju commented 7 years ago

Hello! I have run your code and there is a problem about it. It seems that the update part where tf.assign is used becomes slower as the code keeps running, and it becomes the bottleneck of running speed. I am wondering if you have come across with the same problem? If so, I am looking forward to the solution. Thanks a lot!

alejodosr commented 7 years ago

Hi @zhuyifengzju,

I am facing the same issue. As far as a I know, simple examples of the Open AI should run in real time with no problem. However, they run slow and it becomes slower with time. Have you figured out any solution?

zhuyifengzju commented 7 years ago

@alejodosr Yes, I found out the cause. In the update part, the sess.assign is not pre-defined. The consequence is that as the program runs, it will add more and more new operators. The solution is to define the operators of assignment in initialization part, and use sess.run() in the update part, then the problem of slowing down can be solved.

alejodosr commented 7 years ago

Thank you @zhuyifengzju. Is that related to actor, critic and target networks? I don't know if I correctly understood the changes you propose.. Could you post some code?

zhuyifengzju commented 7 years ago

@alejodosr For example,
This is the part that makes the program slower. def update_target_actor(self): self.sess.run([ self.t_W1_a.assign(TAUself.W1_a+(1-TAU)self.t_W1_a), self.t_B1_a.assign(TAUself.B1_a+(1-TAU)self.t_B1_a), self.t_W2_a.assign(TAUself.W2_a+(1-TAU)self.t_W2_a), self.t_B2_a.assign(TAUself.B2_a+(1-TAU)self.t_B2_a), self.t_W3_a.assign(TAUself.W3_a+(1-TAU)self.t_W3_a), self.t_B3_a.assign(TAUself.B3_a+(1-TAU)self.t_B3_a)])
What I have done is to define a operator in init: self.update_op = [ self.t_W1_a.assign(tauself.W1_a + (1-tau)self.t_W1_a), self.t_B1_a.assign(tauself.B1_a + (1-tau)self.t_B1_a), self.t_W2_a.assign(tauself.W2_a + (1-tau)self.t_W2_a), self.t_B2_a.assign(tauself.B2_a + (1-tau)self.t_B2_a), self.t_W3_a.assign(tauself.W3_a + (1-tau)self.t_W3_a), self.t_B3_a.assign(tauself.B3_a + (1-tau)self.t_B3_a)]

and then in the function update_target_actor, you can simply call: self.sess.run(self.update_op)

Hope it helps.

alejodosr commented 7 years ago

Thanks! @zhuyifengzju It worked like a charm, you've saved me a lot of time :)

alejodosr commented 7 years ago

By the way @zhuyifengzju , have you notice that despite of having action bounds for each environment, the action aren't bounded in the end?

action max: [1.0, 1.0] action min: [-1.0, -1.0] Number of States: 11 Number of Actions: 2 Number of Steps per episode: 50

And the actions of the agent:

Action at step 15 : [-0.02675752 0.72265756] Action at step 16 : [-0.68277169 0.72300326] Action at step 17 : [-0.35434261 1.10894008] Action at step 18 : [-0.35992786 1.4588139 ] Action at step 19 : [-0.06045577 1.63727846] Action at step 20 : [-0.50214302 1.42746616]

Any suggestion?

zhuyifengzju commented 7 years ago

@alejodosr Sorry for my late reply. I think the reason is that when the code deal with boundary, it does not work like a strict constraint as we understand in mathematics. This is just my guess. If I am wrong about this, please point it out. Thank you!