higgsfield / RL-Adventure

Pytorch Implementation of DQN / DDQN / Prioritized replay/ noisy networks/ distributional values/ Rainbow/ hierarchical RL
2.99k stars 587 forks source link

Updated double_dqn #5

Closed aiXander closed 6 years ago

aiXander commented 6 years ago

Hi, first of all: very clean implementation of these algorithms in Pytorch, much much appreciated!!

After reading through the code a bit I think there might be a small error in the code for doubledqn: According to the Paper in the link (Double DQN, page 4): "We therefore propose to evaluate the greedy policy according to the online network, but using the target network to estimate its value. ... In comparison to Double Q-learning (4), the weights of the second network are replaced with the weights of the target network for the evaluation of the current greedy policy."_

So I updated a few lines of code to make sure that the max-Q index is chosen using the current_model! I ran 50 Cartpole-v0 runs for each version and the updated version seems to converge slightly faster..

Cheers :)

higgsfield commented 6 years ago

Wow! Thanks! next notebooks with ddqn should be changed too.