About the ALE settings - Githubissues

muupan / async-rl

Replicating "Asynchronous Methods for Deep Reinforcement Learning" (http://arxiv.org/abs/1602.01783)

MIT License

401 stars 83 forks source link

About the ALE settings #9

Closed ppwwyyxx closed 8 years ago

ppwwyyxx commented 8 years ago

I have some questions in mind about the specific setup of the environment. I'm not sure did you check with the authors on these choice.

repeat_action_probability: The ALE Manual strongly suggests using the default 0.25. Is 0.0 a reasonable choice? Will 0.0 make it easier to learn?
treat_life_lost_as_terminal: This option would definitely make things much easier. Did the original paper use a similar setup?

Btw you're not using the frame_skip parameter anywhere but a magic number 4. You might want to fix that. Great work!

muupan commented 8 years ago

I didn't check with the authors on my ALE settings. I guess they used the same settings with their Nature paper of DQN, so I'm mimicking them. I agree that these settings make learning easier.

ppwwyyxx commented 8 years ago

Thanks. So I just checked their alewrap and treat_life_lost_as_terminal seems like what they've always been using. Didn't find anything about the repeat_action_probability though.

muupan commented 8 years ago

repeat_action_probability is introduced recently (from ALE 0.5.0), after their DQN paper. So it should be turned off to reproduce their results. See the discussion below:

https://groups.google.com/forum/#!topic/deep-q-learning/p4FAIaabwlo

ppwwyyxx commented 8 years ago

FYI, it is confirmed in two latest papers 1, 2 by @mgbellemare (author of ALE) that these two options have always been in easy-mode (as you did). He started to use hard-mode in these two papers.

mgbellemare commented 8 years ago

Jumping in, in our latest paper [2 above] we found the life loss signal to be detrimental. The repeat action prob. affects the original DQN performance significantly, but more recent algorithms (such as Double DQN or our own) don't suffer so much from it.