steveKapturowski / tensorflow-rl

Implementations of deep RL papers and random experimentation
Apache License 2.0
177 stars 47 forks source link

Explanation on DQN needed #14

Closed ionelhosu closed 7 years ago

ionelhosu commented 7 years ago

Hello! Can you please clarify on what you meant in the README by "DQN+CTS after 80M agent steps using 16 actor-learner threads"? DQN isn't a distributed algorithm, it uses a single thread. Did you mean to write A3C instead of DQN? Thank you very much!

steveKapturowski commented 7 years ago

The paper Asynchronous Methods for Deep Reinforcement Learning actually evaluates n-step q-learning as well as a3c algorithms so async training doesn't necessarily imply a3c. In the README I referred to the algorithm used for Montezuma's Revenge as "DQN+CTS" as it seemed like the simplest way to reference the Double DQN algorithm applied in Unifying Count-Based Exploration and Intrinsic Motivation. In that paper they use a single-threaded implementation, but it still makes sense to apply q-learning updates from multiple agents acting in parallel, which is what I do since it trains faster and leads to better performance on the experiments I've performed.

Sorry for any confusion.

ionelhosu commented 7 years ago

Got it, thanks a lot! Referring to it as DQN was a bit confusing.