Open ShuvenduRoy opened 6 years ago
I tried 'Pong-V0' and it did not quite do well. Max reward I got was -2. What might have caused the problem?
These wrappers are from openai baselines project. Frameskip is traditionally used in atari games to speed up learning. You simply do not need to see all the frames - every 4th will do. But previously frameskip was build in environments - all gym environments that do not have "NoFrameskip" in the name already give you only every 4th frame. For some reasons in baselines project they decide to take environments without frameskip and feed it to wrapper MaxAndSkipEnv.
So, if you remove line
assert 'NoFrameskip' in env.spec.id
and feed Pong-V0 to MaxAndSkipEnv wrapper you will get only every 16th frame. Probably that's not enough to play pong.
As this environment is not currently available in gym, what should I change to reduce this 16 frame gap to 4
Just remove
env = MaxAndSkipEnv(env, skip=4)
from "make_atari" in wrappers
Ok, that's good
But I am wondering about the shape of the state. According to the original paper
The details of the architecture are explained in the Methods.
The input to the neural network consists of an 84 x 84 x 4 image produced by the preprocessing map w, followed by three convolutional layers
But I checked the shape of the state. which is (1, 84, 84). Where this preprocessing is deviating from the original paper
Call wrap_deepmind
with argument frame_stack=True
. This problem was raised in #9
Looks like something mysterious is happening. I also wonder how this even worked without sequence information. Any idea what is going on here?
Are we Bruteforcing the model to learn from current pixel only, which might not work in more complex case?
Well Pong is just an extremely simple game. I suspect that someone with perfect reaction(like RL-agen) can just always move towards ball and that would be sufficient to win. For many atari games stacking frames is necessary though.
ok!!!
I am not quite getting the logic from code, where from this 4 frames are coming. As this env skips 4 frames, what is the situation now? Are we having 16 frame information in (4, 80, 80) size state? Or the skipping 4 frame is now inserted in this?
You have information about the current frame, 4 frames ago, 8 frames ago and 16 frames ago. You don't have the skipped frames.
ow. I got it. Thanks :-)
With all this information with the modification, I trained the model. But could not quite regenerate the result as the original one. Here is the code and the result
Any explanation where it is causing problem?
Why necessarily 'NoFrameskip'? And what is the specification of 'NoFrameskip'?
https://github.com/higgsfield/RL-Adventure/blob/0f82b6922e8a1a8515fc4c84c28702e7caa226f1/common/wrappers.py#L214