pytorch / tutorials

PyTorch tutorials.
https://pytorch.org/tutorials/
BSD 3-Clause "New" or "Revised" License
8.26k stars 4.07k forks source link

DQN is broken #263

Closed zspasztori closed 1 year ago

zspasztori commented 6 years ago

Hi,

I am running your implementation of DQN. Tried several runs, but it does not seem to converge with the given hyperparameters. Tried higher amount of episodes in the magnitude of 1000, but there is no convergence.

Also the ploting has problems, you need to move the display part before sleep. dqn_tutorial

cc @vmoens @nairbv

johan-gras commented 6 years ago

figure_2

Same issue here. Running the exact code from the tutorial for 10 000 episodes and the algorithm doesn't converge. I've been testing lots of differents hyperparameters without success.

cebenso2 commented 6 years ago

I am experiencing the same issue.

leonardblier commented 6 years ago

I also have the same issue.

danaugrs commented 5 years ago

Same:

pytorch-dqn-example

vmikulik commented 5 years ago

+1 to the plotting code not working (on MacOS, using Jupyter). Swapping display and clear_output solved the issue for me.

Also not converging for me (apologies for unreadable scale, I use a night theme in Jupyter so it's in white):

image

mansur007 commented 5 years ago

Has anyone solved this problem? I am having it too...

zhihaocheng commented 5 years ago

same too

josiahls commented 5 years ago

After making my own DQN. My main concern:

As noted: image [1] Mnih, Volodymyr, et al. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013).

Key quote "parameters from the previous iteration θi−1 are held fixed when optimising the loss function" they do not say previous episode, they say previous iteration.

I recommend changing your target updates to every few iterations.

This is because since the target network is be initialized randomly, it will need a chance to correct itself / iron out pockets of incorrent Q values. Every few episodes seems way too slow and I fould it takes hundreds / thousands of target updates to get a decent target network.

I recommend for those learning this to experiment with the gym_maze envs without exploration to make sure your DQN is learning properly. Cartpole is way to hard to debug your DQN when starting out.

yizhixiaotaozi commented 4 years ago

Same problem. Adjusting parameters is a HARD work T-T .

thomashirtz commented 4 years ago

Did someone manage to make it work ? Is it possible to share the parameter used ? I really cannot find any set of parameters that work

StoyanVenDimitrov commented 3 years ago

Any improvements here?

mansur007 commented 3 years ago

I remember making things work here https://github.com/mansur007/gym_cartpole_dqn/ but the code is from ~2 years ago...

AsadJeewa commented 3 years ago

Any updates here?

Erlix322 commented 2 years ago

Sorry to bother, but Any updates?

carljparker commented 1 year ago

I believe that this issue is resolved by the work done for #2030.