simoninithomas / Deep_reinforcement_learning_Course

Implementations from the free course Deep Reinforcement Learning with Tensorflow and PyTorch
http://www.simoninithomas.com/deep-rl-course
3.74k stars 1.23k forks source link

Sonic A2C not working for Pong #48

Open slerman12 opened 5 years ago

slerman12 commented 5 years ago

I'm trying to test whether the A2C code for Sonic could be used to train an agent on another environment. I replaced the Sonic environments with 8 copies of Pong, and I varied up the number of epochs and mini batches and nsteps, but no matter what, I could not get it to learn Pong. Is there a reason this implementation won't train on Pong? Am I missing some important parameter? Could you test it for yourself and let me know? All I had to do was change the environments in agent.py with a Pong make_env() that used frame stacking and preprocessing.

pengzhi1998 commented 5 years ago

Hi, how many episodes did you run? And may I know your total reward for each episode?

slerman12 commented 5 years ago

If I recall, 100 updates on the default settings was not enough to make any progress. The reward did not go up from -20 per episode.

pengzhi1998 commented 5 years ago
Yes, the situation is very similar. The rewards are around minus 20 for each episode. I think it is because 100 updates are far not enough. We need to train at least 1000 episodes. Train on GPU will be better.

Good luck! 

------------------------原始邮件------------------------

发信人:Sam Lermannotifications@github.com 时间:05:49:20 上午 收信人:simoninithomas/Deep_reinforcement_learning_Coursedeep_reinforcement_learning_course@noreply.github.com 抄送:2590477658tyypz@sina.com,Commentcomment@noreply.github.com 标题:Re: [simoninithomas/Deep_reinforcement_learning_Course] Sonic A2C not working for Pong (#48)

If I recall, 100 updates on the default settings was not enough to make any progress. The reward did not go up from -20 per episode. —You are receiving this because you commented.Reply to this email directly, view it on GitHub, or mute the thread.

slerman12 commented 5 years ago

That surprises me, since the trained Sonic model required only 270 updates. That’s already processing millions of states, which should be enough for Pong, shouldn’t it?

slerman12 commented 5 years ago

I'll try to run 1000 updates and get back to you. What if it still doesn't play Pong then? I'm hoping to use this as a baseline for my research with transfer learning. Would you not recommend that?