Open asiddharth opened 5 years ago
@asiddharth I'll run Breakout on PDD-DQN again once I find the time / resources
same thing here, old version works.
@ZaneH1992 are you able to reproduce the results?
How do your results compare to http://htmlpreview.github.io/?https://github.com/openai/baselines/blob/master/benchmarks_atari10M.htm?
@christopherhesse The agent reached a score of 22.4 after 8.3M frames on the PDD-DQN, and a score of 28 on vanilla DQN after 10M frames. These scores are way below the ones listed in the link above.
@asiddharth Hi, have you solve the problem with vanilla DQN? I'm also encounting the same problem here with all of the steps you mentioned
Hi @DanielTakeshi , I am facing the same issue where the vanilla DQN and the PDD DQN agents are not learning as expected on BreakoutNoFrameskip-v4.
I copied over the hyper parameters and the exploration schedule mentioned above (in issue #672). I am running the experiments with this baselines commit.
Here is a list of the hyper parameters being used (I modified defaults.py) for PDD-DQN. network='conv_only', lr=1e-4, buffer_size=int(1e6), exploration_fraction=0.1, exploration_final_eps=0.01, train_freq=4, learning_starts=80000, target_network_update_freq=40000, gamma=0.99, prioritized_replay=True, prioritized_replay_alpha=0.6, checkpoint_freq=10000, checkpoint_path=None, dueling=True
I used the same hyperparameters but set dueling=False, prioritized_replay=False in defaults.py, and set double_q to false in build_graph.py for the vanilla DQN agent.
As mentioned in the readme, I also tried to reproduce results with commit (7bfbcf1), without a changing the hyperparameters. But I was not able to reproduce the results.
Would be really helpful if you could please let me know if I am doing anything wrong, and if any other hyper parameter combination is better.
Thanks!
Some results with the changed hyper parameters and code commit. Results for PDD-DQN :
| % time spent exploring | 2 | | episodes | 5.1e+04 | | mean 100 episode reward | 22.1 | | steps | 8.34e+06 |
Saving model due to mean reward increase: 21.5 -> 22.4
Results for vanilla DQN :
| % time spent exploring | 1 | | episodes | 5.17e+04 | | mean 100 episode reward | 23.2 | | steps | 1.05e+07 |
(Highest score for vanilla DQN is 28 at this point)
Originally posted by @asiddharth in https://github.com/openai/baselines/issues/672#issuecomment-519837654