Training slowing down dramatically

steveKapturowski / tensorflow-rl

Implementations of deep RL papers and random experimentation

Apache License 2.0

177 stars 47 forks source link

Training slowing down dramatically #21

Open ionelhosu opened 6 years ago

ionelhosu commented 6 years ago

Did anyone face the issue of the training process slowing down? For example, training one DQN-CTS worker on Montezuma's Revenge runs at about 220 iter/sec after 100.000 steps and 35 iter/sec after 400.000. Any thoughts? Thank you.

steveKapturowski commented 6 years ago

Hi @ionelhosu, I think when it's running at 220 iter/sec the training hasn't actually started yet; it's just filling the replay buffer until it reaches some minimum size. That explains the slowdown, but it is unexpected just how slow that training updates are. I'd like to do some tensorflow profiling to spot what the bottleneck is here, but if you find anything interesting on your own please let me know.