openai / baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
MIT License
15.63k stars 4.86k forks source link

How to visualize training environments...? #882

Open ZeroMaxinumXZ opened 5 years ago

ZeroMaxinumXZ commented 5 years ago

Hi, so... New here. Don't know whether this belongs here, Retro or Gym, but... I'd like to see a visualization of the environment as a PPO2 agent is training. I'd also like to evaluate the agent as well. Is this possible, and if so how do I do it?

Code (Just a slightly modified version of Retro's example ppo script):


"""
Train an agent using Proximal Policy Optimization from OpenAI Baselines
"""

import argparse

import retro
from baselines.common.vec_env import SubprocVecEnv
from baselines.common.retro_wrappers import make_retro, wrap_deepmind_retro
from baselines.ppo2 import ppo2

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--game', default='AirStriker-Genesis')
    parser.add_argument('--state', default=retro.State.DEFAULT)
    parser.add_argument('--scenario', default=None)
    args = parser.parse_args()

    def make_env():
        env = make_retro(game=args.game, state=args.state, scenario=args.scenario)
        env = wrap_deepmind_retro(env)
        return env

    venv = SubprocVecEnv([make_env] * 8)
    ppo2.learn(
        network='cnn', 
        env=venv, 
        total_timesteps=int(100e6),
        nsteps=128,
        nminibatches=4,
        lam=0.95, 
        gamma=0.99, 
        noptepochs=4, 
        log_interval=1,
        ent_coef=.01,
        lr=lambda f : f * 2.5e-4,
        cliprange=0.1,
    )

if __name__ == '__main__':
    main()```
DanielTakeshi commented 5 years ago

What do you mean by visualize? Do you mean the environment rendering like env.render()? You can do that but it makes the algorithm very slow.

ZeroMaxinumXZ commented 5 years ago

@DanielTakeshi Yes, I mean environment rendering... I want to at least be able to evaluate it's performance on the game, not necessarily as it's training but... A ppo2.evaluate(rendering=True, env=env) method of sorts... I'm kind of new to Gym and I've really only been programming for nine months so...