Closed ppwwyyxx closed 8 years ago
I didn't check with the authors on my ALE settings. I guess they used the same settings with their Nature paper of DQN, so I'm mimicking them. I agree that these settings make learning easier.
Thanks. So I just checked their alewrap and treat_life_lost_as_terminal seems like what they've always been using. Didn't find anything about the repeat_action_probability though.
repeat_action_probability
is introduced recently (from ALE 0.5.0), after their DQN paper. So it should be turned off to reproduce their results. See the discussion below:
https://groups.google.com/forum/#!topic/deep-q-learning/p4FAIaabwlo
Jumping in, in our latest paper [2 above] we found the life loss signal to be detrimental. The repeat action prob. affects the original DQN performance significantly, but more recent algorithms (such as Double DQN or our own) don't suffer so much from it.
I have some questions in mind about the specific setup of the environment. I'm not sure did you check with the authors on these choice.
Btw you're not using the frame_skip parameter anywhere but a magic number 4. You might want to fix that. Great work!