Closed zspasztori closed 1 year ago
Same issue here. Running the exact code from the tutorial for 10 000 episodes and the algorithm doesn't converge. I've been testing lots of differents hyperparameters without success.
I am experiencing the same issue.
I also have the same issue.
Same:
+1 to the plotting code not working (on MacOS, using Jupyter). Swapping display and clear_output solved the issue for me.
Also not converging for me (apologies for unreadable scale, I use a night theme in Jupyter so it's in white):
Has anyone solved this problem? I am having it too...
same too
After making my own DQN. My main concern:
As noted: [1] Mnih, Volodymyr, et al. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013).
Key quote "parameters from the previous iteration θi−1 are held fixed when optimising the loss function" they do not say previous episode, they say previous iteration.
I recommend changing your target updates to every few iterations.
This is because since the target network is be initialized randomly, it will need a chance to correct itself / iron out pockets of incorrent Q values. Every few episodes seems way too slow and I fould it takes hundreds / thousands of target updates to get a decent target network.
I recommend for those learning this to experiment with the gym_maze envs without exploration to make sure your DQN is learning properly. Cartpole is way to hard to debug your DQN when starting out.
Same problem. Adjusting parameters is a HARD work T-T .
Did someone manage to make it work ? Is it possible to share the parameter used ? I really cannot find any set of parameters that work
Any improvements here?
I remember making things work here https://github.com/mansur007/gym_cartpole_dqn/ but the code is from ~2 years ago...
Any updates here?
Sorry to bother, but Any updates?
I believe that this issue is resolved by the work done for #2030.
Hi,
I am running your implementation of DQN. Tried several runs, but it does not seem to converge with the given hyperparameters. Tried higher amount of episodes in the magnitude of 1000, but there is no convergence.
Also the ploting has problems, you need to move the display part before sleep.
cc @vmoens @nairbv