Env compatibility with PyGame (PLE)

theweaklink commented 7 years ago

Hello,

I have been trying to plug PyGame environment with the A3C agent but there is something odd which I haven't found an explanation for, maybe someone can point out what is wrong?

I got gym_ple to use PLE as a gym env. Then I made a env processing pipe similar to the Atari one:

In envs.py:

def create_env(env_id, client_id, remotes, **kwargs):
    spec = gym.spec(env_id)

    if spec.tags.get('flashgames', False):
        return create_flash_env(env_id, client_id, remotes, **kwargs)

    elif spec.tags.get('atari', False) and spec.tags.get('vnc', False):
        return create_vncatari_env(env_id, client_id, remotes, **kwargs)

    elif spec.tags.get('pygame', False):
        return create_pygame_env(env_id)

    else:
        # Assume atari.
        assert "." not in env_id  # universe environments have dots in names.
        return create_atari_env(env_id)

def create_pygame_env(env_id):
    env = gym.make(env_id)
    env = Vectorize(env)
    env = PyGameProcess(env)
    env = DiagnosticsInfo(env)
    env = Unvectorize(env)
    return env

def _process(frame):
    # pygame frame is 48x48x3, no need to resize and/or crop
    frame = frame.mean(2)
    frame = frame.astype(np.float32)
    frame *= (1.0 / 255.0)
    frame = np.reshape(frame, [48, 48, 1])
    return frame

class PyGameProcess(vectorized.ObservationWrapper):
    def __init__(self, env=None):
        super(PyGameProcess, self).__init__(env)
        self.observation_space = Box(0.0, 1.0, [48, 48, 1])

    def _observation(self, observation_n):
        return [_process(observation) for observation in observation_n]

[...]

If I run the agent with Pong. PyGame Pong frame is much simpler to process than Pong Atari, there is no score box, no frame, nothing to clutter the image and the game is just black and white instead of brown background in Atari.

Here are the results for 6 workers on my machine (based on Tensorboard graphs and visualizing actual games):

PyGame Pong: no learning at all after 3 hours, keep loosing without scoring more than 1 point
Atari Pong: the agent starts learning how to score after 1 hour then completely master the other agent (i-e always win) in 3 hours

I checked the frame that is sent from Gym PLE and everything looks good:

the tensorboard images show proper snapshot of the pong game
if I output the image in _process() (e.g. using cv2.imwrite(), I get the correct image
if I check the image dimensions and range value in _process() (frame.shape, np.min(frame), np.max(frame)), I get the proper range, i-e [0-255] in the initial frame and [0-1] after rescaling (frame *= (1.0 / 255.0))

What am I missing?

I used Pong as an example because I can directly compare the Atari game and the PyGame one but none of the simple PyGame I have tried (pixelcopter, catcher) seem to be working well, hence my conclusion: there is something wrong with my code to link PyGame with the A3C agent.

Any insight? I assume/think adding link to other env could be useful to other people too, right?

thanks!!

tlbtlbtlb commented 7 years ago

One possibility is the frame rate. The gym-Atari and vnc-Atari envs run at 15 fps and 10 fps respectively. If the PyGame env runs at 60 fps (meaning, 60 action/observations per second of game time, not talking about wall time here) then the starter-agent may not be able to learn it. Try adding, in the wrapper's .step function, 4 calls to the underlying .step to get to 15 fps.

theweaklink commented 7 years ago

Thank you Trevor, you are touching a key point here. Reducing the fps is just the beginning: game (ball velocity and movement speed) must be adjusted accordingly otherwise the agent seemed to be overwhelmed too.

I haven't found the best set of parameters yet for a given fps of 15 but it is heading into the right direction. There are also some differences between Atari Pong and PyGame Pong which I didn't pay much attention too but are clearly impacting the game result:

the pong opponent seems to be stronger than Atari Pong (deterministic-v3)
the physics of PyGame pong are much simpler: some high angle (or low angle, depends how you see it) ball trajectory achieved in Atari using the paddle corner are not possible in PyGame
PyGame pong has some bug to detect the collision with the paddle when ball is too close to the paddle edge, making the ball to go through instead of bouncing.

Bottomline:

contrary to what I was thinking, comparing both Pong is not a good benchmark and PyGame Pong must go under some adjustments.
tuning the game parameters for agent interaction vs human interaction is needed.
fps is critical (otherwise it is garbage-in/garbage out)

Thank you very much Trevor, your insight was very helpful!

openai / universe-starter-agent

Env compatibility with PyGame (PLE) #92