DQN is broken - Githubissues

zspasztori commented 6 years ago

Hi,

I am running your implementation of DQN. Tried several runs, but it does not seem to converge with the given hyperparameters. Tried higher amount of episodes in the magnitude of 1000, but there is no convergence.

Also the ploting has problems, you need to move the display part before sleep. dqn_tutorial

cc @vmoens @nairbv

johan-gras commented 6 years ago

figure_2

Same issue here. Running the exact code from the tutorial for 10 000 episodes and the algorithm doesn't converge. I've been testing lots of differents hyperparameters without success.

cebenso2 commented 6 years ago

I am experiencing the same issue.

leonardblier commented 6 years ago

I also have the same issue.

danaugrs commented 5 years ago

Same:

pytorch-dqn-example

vmikulik commented 5 years ago

+1 to the plotting code not working (on MacOS, using Jupyter). Swapping display and clear_output solved the issue for me.

Also not converging for me (apologies for unreadable scale, I use a night theme in Jupyter so it's in white):

mansur007 commented 5 years ago

Has anyone solved this problem? I am having it too...

zhihaocheng commented 5 years ago

same too

josiahls commented 5 years ago

After making my own DQN. My main concern:

Why is the target network being updated every few episodes. What DQN paper is that supposed to be related to? Surely you should be doing per step, like every 5-10 steps update the target network.

As noted: [1] Mnih, Volodymyr, et al. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013).

Key quote "parameters from the previous iteration θi−1 are held fixed when optimising the loss function" they do not say previous episode, they say previous iteration.

I recommend changing your target updates to every few iterations.

This is because since the target network is be initialized randomly, it will need a chance to correct itself / iron out pockets of incorrent Q values. Every few episodes seems way too slow and I fould it takes hundreds / thousands of target updates to get a decent target network.

I recommend for those learning this to experiment with the gym_maze envs without exploration to make sure your DQN is learning properly. Cartpole is way to hard to debug your DQN when starting out.