openai / baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
MIT License
15.8k stars 4.88k forks source link

Cannot reproduce the benchmark results of DQN (vanilla and PDD) on Breakout #983

Open asiddharth opened 5 years ago

asiddharth commented 5 years ago

Hi @DanielTakeshi , I am facing the same issue where the vanilla DQN and the PDD DQN agents are not learning as expected on BreakoutNoFrameskip-v4.

I copied over the hyper parameters and the exploration schedule mentioned above (in issue #672). I am running the experiments with this baselines commit.

Here is a list of the hyper parameters being used (I modified defaults.py) for PDD-DQN. network='conv_only', lr=1e-4, buffer_size=int(1e6), exploration_fraction=0.1, exploration_final_eps=0.01, train_freq=4, learning_starts=80000, target_network_update_freq=40000, gamma=0.99, prioritized_replay=True, prioritized_replay_alpha=0.6, checkpoint_freq=10000, checkpoint_path=None, dueling=True

I used the same hyperparameters but set dueling=False, prioritized_replay=False in defaults.py, and set double_q to false in build_graph.py for the vanilla DQN agent.

As mentioned in the readme, I also tried to reproduce results with commit (7bfbcf1), without a changing the hyperparameters. But I was not able to reproduce the results.

Would be really helpful if you could please let me know if I am doing anything wrong, and if any other hyper parameter combination is better.

Thanks!

Some results with the changed hyper parameters and code commit. Results for PDD-DQN :

| % time spent exploring | 2 | | episodes | 5.1e+04 | | mean 100 episode reward | 22.1 | | steps | 8.34e+06 |

Saving model due to mean reward increase: 21.5 -> 22.4

Results for vanilla DQN :

| % time spent exploring | 1 | | episodes | 5.17e+04 | | mean 100 episode reward | 23.2 | | steps | 1.05e+07 |

(Highest score for vanilla DQN is 28 at this point)

Originally posted by @asiddharth in https://github.com/openai/baselines/issues/672#issuecomment-519837654

DanielTakeshi commented 5 years ago

@asiddharth I'll run Breakout on PDD-DQN again once I find the time / resources

ZaneH1992 commented 5 years ago

same thing here, old version works.

asiddharth commented 5 years ago

@ZaneH1992 are you able to reproduce the results?

christopherhesse commented 5 years ago

How do your results compare to http://htmlpreview.github.io/?https://github.com/openai/baselines/blob/master/benchmarks_atari10M.htm?

asiddharth commented 5 years ago

@christopherhesse The agent reached a score of 22.4 after 8.3M frames on the PDD-DQN, and a score of 28 on vanilla DQN after 10M frames. These scores are way below the ones listed in the link above.

Bowen-He commented 3 years ago

@asiddharth Hi, have you solve the problem with vanilla DQN? I'm also encounting the same problem here with all of the steps you mentioned