Closed ionelhosu closed 7 years ago
The paper Asynchronous Methods for Deep Reinforcement Learning actually evaluates n-step q-learning as well as a3c algorithms so async training doesn't necessarily imply a3c. In the README I referred to the algorithm used for Montezuma's Revenge as "DQN+CTS" as it seemed like the simplest way to reference the Double DQN algorithm applied in Unifying Count-Based Exploration and Intrinsic Motivation. In that paper they use a single-threaded implementation, but it still makes sense to apply q-learning updates from multiple agents acting in parallel, which is what I do since it trains faster and leads to better performance on the experiments I've performed.
Sorry for any confusion.
Got it, thanks a lot! Referring to it as DQN was a bit confusing.
Hello! Can you please clarify on what you meant in the README by "DQN+CTS after 80M agent steps using 16 actor-learner threads"? DQN isn't a distributed algorithm, it uses a single thread. Did you mean to write A3C instead of DQN? Thank you very much!