Updated double_dqn - Githubissues

Hi, first of all: very clean implementation of these algorithms in Pytorch, much much appreciated!!

After reading through the code a bit I think there might be a small error in the code for doubledqn: According to the Paper in the link (Double DQN, page 4): "We therefore propose to evaluate the greedy policy according to the online network, but using the target network to estimate its value. ... In comparison to Double Q-learning (4), the weights of the second network are replaced with the weights of the target network for the evaluation of the current greedy policy."_

So I updated a few lines of code to make sure that the max-Q index is chosen using the current_model! I ran 50 Cartpole-v0 runs for each version and the updated version seems to converge slightly faster..

Cheers :)

higgsfield / RL-Adventure

Updated double_dqn #5