hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
MIT License
4.14k stars 723 forks source link

How to use a custom Openai gym environment with Openai stable-baselines RL algorithms? #568

Closed valdezf10 closed 4 years ago

valdezf10 commented 4 years ago


I've been trying to use a custom openai gym environment for fixed wing uav from https://github.com/eivindeb/fixed-wing-gym by testing it with the openai stable-baselines algorithms but I have been running into issues for several days now. My baseline is the CartPole example Multiprocessing: Unleashing the Power of Vectorized Environments from https://stable-baselines.readthedocs.io/en/master/guide/examples.html#multiprocessing-unleashing-the-power-of-vectorized-environments since I would need to supply arguments and I am trying to use multiprocessing which I believe this example is all I need.

I have modified the baseline example as follows:

import gym
import numpy as np

from stable_baselines.common.policies import MlpPolicy
from stable_baselines.common.vec_env import SubprocVecEnv
from stable_baselines.common import set_global_seeds
from stable_baselines import ACKTR, PPO2
from gym_fixed_wing.fixed_wing import FixedWingAircraft

def make_env(env_id, rank, seed=0):
    Utility function for multiprocessed env.

    :param env_id: (str) the environment ID
    :param num_env: (int) the number of environments you wish to have in subprocesses
    :param seed: (int) the inital seed for RNG
    :param rank: (int) index of the subprocess

    def _init():
        env = FixedWingAircraft("fixed_wing_config.json")
        #env = gym.make(env_id)
        env.seed(seed + rank)
        return env

    return _init

if __name__ == '__main__':
    env_id = "fixed_wing"
    #env_id = "CartPole-v1"
    num_cpu = 4  # Number of processes to use
    # Create the vectorized environment
    env = SubprocVecEnv([lambda: FixedWingAircraft for i in range(num_cpu)])
    #env = SubprocVecEnv([make_env(env_id, i) for i in range(num_cpu)])

    model = PPO2(MlpPolicy, env, verbose=1)

    obs = env.reset()
    for _ in range(1000):
        action, _states = model.predict(obs)
        obs, rewards, dones, info = env.step(action)

and the error I keep getting is the following:

Traceback (most recent call last):
  File "/home/bonie/PycharmProjects/deepRL_fixedwing/fixed-wing-gym/gym_fixed_wing/ACKTR_fixedwing.py", line 38, in <module>
    model = PPO2(MlpPolicy, env, verbose=1)
  File "/home/bonie/PycharmProjects/deepRL_fixedwing/stable-baselines/stable_baselines/ppo2/ppo2.py", line 104, in __init__
  File "/home/bonie/PycharmProjects/deepRL_fixedwing/stable-baselines/stable_baselines/ppo2/ppo2.py", line 134, in setup_model
    n_batch_step, reuse=False, **self.policy_kwargs)
  File "/home/bonie/PycharmProjects/deepRL_fixedwing/stable-baselines/stable_baselines/common/policies.py", line 660, in __init__
    feature_extraction="mlp", **_kwargs)
  File "/home/bonie/PycharmProjects/deepRL_fixedwing/stable-baselines/stable_baselines/common/policies.py", line 540, in __init__
    scale=(feature_extraction == "cnn"))
  File "/home/bonie/PycharmProjects/deepRL_fixedwing/stable-baselines/stable_baselines/common/policies.py", line 221, in __init__
  File "/home/bonie/PycharmProjects/deepRL_fixedwing/stable-baselines/stable_baselines/common/policies.py", line 117, in __init__
    self._obs_ph, self._processed_obs = observation_input(ob_space, n_batch, scale=scale)
  File "/home/bonie/PycharmProjects/deepRL_fixedwing/stable-baselines/stable_baselines/common/input.py", line 51, in observation_input
NotImplementedError: Error: the model does not support input space of type NoneType

I am not sure what to really input as the env_id and for the def make_env(env_id, rank, seed=0) function. I am also thinking that the VecEnv function for parallel processes is not properly setup.

I am coding with Python v3.6 using PyCharm IDE in Ubuntu 18.04.

Any suggestions would really help!

Thank you in advance.

valdezf10 commented 4 years ago

It seems the newer versions of stable-baselines produces NaN values in the actions from the actor. Github files from the repo I have linked were updated and fixed the issue.