muupan / async-rl

Replicating "Asynchronous Methods for Deep Reinforcement Learning" (http://arxiv.org/abs/1602.01783)
MIT License
401 stars 83 forks source link

About the ALE settings #9

Closed ppwwyyxx closed 8 years ago

ppwwyyxx commented 8 years ago

I have some questions in mind about the specific setup of the environment. I'm not sure did you check with the authors on these choice.

Btw you're not using the frame_skip parameter anywhere but a magic number 4. You might want to fix that. Great work!

muupan commented 8 years ago

I didn't check with the authors on my ALE settings. I guess they used the same settings with their Nature paper of DQN, so I'm mimicking them. I agree that these settings make learning easier.

ppwwyyxx commented 8 years ago

Thanks. So I just checked their alewrap and treat_life_lost_as_terminal seems like what they've always been using. Didn't find anything about the repeat_action_probability though.

muupan commented 8 years ago

repeat_action_probability is introduced recently (from ALE 0.5.0), after their DQN paper. So it should be turned off to reproduce their results. See the discussion below:

https://groups.google.com/forum/#!topic/deep-q-learning/p4FAIaabwlo

ppwwyyxx commented 8 years ago

FYI, it is confirmed in two latest papers 1, 2 by @mgbellemare (author of ALE) that these two options have always been in easy-mode (as you did). He started to use hard-mode in these two papers.

mgbellemare commented 8 years ago

Jumping in, in our latest paper [2 above] we found the life loss signal to be detrimental. The repeat action prob. affects the original DQN performance significantly, but more recent algorithms (such as Double DQN or our own) don't suffer so much from it.