higgsfield / RL-Adventure

Pytorch Implementation of DQN / DDQN / Prioritized replay/ noisy networks/ distributional values/ Rainbow/ hierarchical RL
2.99k stars 587 forks source link

Gym env #12

Open ShuvenduRoy opened 6 years ago

ShuvenduRoy commented 6 years ago

Why necessarily 'NoFrameskip'? And what is the specification of 'NoFrameskip'?

https://github.com/higgsfield/RL-Adventure/blob/0f82b6922e8a1a8515fc4c84c28702e7caa226f1/common/wrappers.py#L214

ShuvenduRoy commented 6 years ago

I tried 'Pong-V0' and it did not quite do well. Max reward I got was -2. What might have caused the problem?

garkavem commented 6 years ago

These wrappers are from openai baselines project. Frameskip is traditionally used in atari games to speed up learning. You simply do not need to see all the frames - every 4th will do. But previously frameskip was build in environments - all gym environments that do not have "NoFrameskip" in the name already give you only every 4th frame. For some reasons in baselines project they decide to take environments without frameskip and feed it to wrapper MaxAndSkipEnv. So, if you remove line assert 'NoFrameskip' in env.spec.id and feed Pong-V0 to MaxAndSkipEnv wrapper you will get only every 16th frame. Probably that's not enough to play pong.

ShuvenduRoy commented 6 years ago

As this environment is not currently available in gym, what should I change to reduce this 16 frame gap to 4

garkavem commented 6 years ago

Just remove env = MaxAndSkipEnv(env, skip=4) from "make_atari" in wrappers

ShuvenduRoy commented 6 years ago

Ok, that's good

But I am wondering about the shape of the state. According to the original paper

The details of the architecture are explained in the Methods. 
The input to the neural network consists of an 84 x 84 x 4 image produced by the preprocessing map w, followed by three convolutional layers

But I checked the shape of the state. which is (1, 84, 84). Where this preprocessing is deviating from the original paper

garkavem commented 6 years ago

Call wrap_deepmind with argument frame_stack=True. This problem was raised in #9

ShuvenduRoy commented 6 years ago

Looks like something mysterious is happening. I also wonder how this even worked without sequence information. Any idea what is going on here?

Are we Bruteforcing the model to learn from current pixel only, which might not work in more complex case?

garkavem commented 6 years ago

Well Pong is just an extremely simple game. I suspect that someone with perfect reaction(like RL-agen) can just always move towards ball and that would be sufficient to win. For many atari games stacking frames is necessary though.

ShuvenduRoy commented 6 years ago

ok!!!

I am not quite getting the logic from code, where from this 4 frames are coming. As this env skips 4 frames, what is the situation now? Are we having 16 frame information in (4, 80, 80) size state? Or the skipping 4 frame is now inserted in this?

garkavem commented 6 years ago

You have information about the current frame, 4 frames ago, 8 frames ago and 16 frames ago. You don't have the skipped frames.

ShuvenduRoy commented 6 years ago

ow. I got it. Thanks :-)

ShuvenduRoy commented 6 years ago

With all this information with the modification, I trained the model. But could not quite regenerate the result as the original one. Here is the code and the result

Any explanation where it is causing problem?