openai / baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
MIT License
15.77k stars 4.88k forks source link

why i can't get the performance presented in the benchmark? #794

Open sanity111 opened 5 years ago

sanity111 commented 5 years ago

i just always get the poor result(e.g. scores:30 after 1e7 or 1e8 timesteps) when running Algorithm deeq with the following command ,"python -m baselines.run --alg=deepq --env=AlienNoFrameskip-v4 --num_timesteps=1e7". Does i miss some important parameters ? and does the y-aixs in benchmark denote the "mean_100ep_reward" in deepq implementation?

pzhokhov commented 5 years ago

Hi @sanity111! Which benchmark are you referring to? The one referenced here (http://htmlpreview.github.io/?https://github.com/openai/baselines/blob/master/benchmarks_atari10M.htm) uses all default arguments (like in your command like) but does not have Alien. Did you try running deepq on one of the atari games in the benchmark above and compare the results? To answer your question directly - the benchmark's y axis is not quite same as the mean_100ep_reward. What is published in the benchmark is raw reward (captured by using Monitor wrapper), the mean_100ep_reward is the reward that the algorithm sees. In case of atari games, we use ClipRewardEnv that returns sign of the reward (i.e if reward is positive, it will return +1, no matter how high the reward is). This is a common practice to avoid per-game hyperparameter tuning.