hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.16k stars 725 forks source link

AssertionError: The observation space must inherit from gym.spaces cf https://github.com/openai/gym/blob/master/gym/spaces/ #933

Closed KeeratKG closed 4 years ago

KeeratKG commented 4 years ago

Describe the bug Hi all, I am using the stable-baselines for a policy optimisation program pertaining to a drug distribution problem. I have made a custom environment following the gym interface using the guide given at [https://colab.research.google.com/github/araffin/rl-tutorial-jnrr19/blob/master/5_custom_gym_env.ipynb#scrollTo=1CcUVatq-P0l] and tried to validate it using the check_env() method. I am unable to understand and fix the error described below.

Code example This is the code I made:

%tensorflow_version 1.x
!pip install stable-baselines[mpi]==2.10.0

import numpy as np 
import gym 
from gym import spaces 
import matplotlib.pyplot as plt 
from stable_baselines.common.env_checker import check_env

class StatesEnv(gym.Env):
    """
    Customised Environment that follows gym interface.
    Describes relevant properties of the state and action spaces. 
    """
    metadata = {'render.modes':['human']}

    # states = 6 # Delhi, Guj, Raja, MP, Maha, TN
    # properties = 5

    def __init__(self, s, prop, episodes):
        #initialise state properties and their values, obsn space and action space???
        self.states = s #no of independent simulations to be run 
        self.properties = prop
        #observation will be the condition of state at a particular time 
        self.observation_space = np.array(spaces.Box(low= np.zeros((s, prop)), high = np.full((s, prop), float('inf')), shape = (s, prop), dtype = np.float32))
        #actions are vectors of the form [n1, n2, n3,...nk, r] for k states and r reserved amount of drug 
        self.action_space = np.array(spaces.Box(low = np.zeros((s+1, ), dtype = int), high = np.array([100]*(s+1)), shape = (s + 1, ), dtype = np.uint8)) 

        # sum = 0
        # for i in range(s+1):
        #     sum += self.action_space[i]
        # assert sum == 100 #returns error if total % is not 100 
        self.m = []
        self.prob_dying = []
        self.episodes = episodes

    def reset(self):
        """
        Resets observation_space to a matrix initialising situation of states wrt the current figures; 
        action_space tp start exploring from the point of equal distribution between all states.
        """

        self.action_space = np.array([100/(self.states+1)]*(self.states+1))
        self.observation_space =  np.array([[80188, 28329,  2558,   16787941,   0.03190003492],  
                              [30709,   6511,   1789,   60439692,   0.05825653717],
                              [16944,   3186,   391,    68548437,   0.02307601511],
                              [12965,   2444,   550,    72626809,   0.04242190513],
                              [159133,  67615,  7273,   112374333,  0.04570390805],
                              [78335,   33216,  1025,   72147030,   0.01308482798]])
                              # Confirmed   Active  Deaths   Population  P(dying)
                              # Delhi, Guj, Raja, MP, Maha, TN 
    def step(self, action, total):
        """
        Assumptions:
        1. Drug has 50% efficacy 
        2. Vaccine is passive, not antigen based- works to fight off existing infection.
        3. 1 person requires 1 vial (dose) only.
        """
        P = self.observation_space
        # Total number of vials available 
        self.total = total

        #no of units distrbuted to respective states 
        received = []
        for i in range(self.states):
            received[i] = self.total*action[i]/100

        #add column of units distributed per state to update observation space 
        P = np.append(P, received, axis=1)

        #measuring the effect of drug on each state 
        # m is the no of ppl moving from active to recovered 

        for i in range(self.states):
            self.m[i] = 0.5*received[i]              #50% efficacy
            P[i, 1] -= self.m[i]
            self.prob_dying[i] = P[i, 2]/P[i, 0]

        #task is done when all states show a decrease in probability of dying 
        done = bool(P[i] < self.observation_space[i, 4] for i in range(self.states))  
        #reward only when task done 
        reward = 10 if done else 0

        # Optionally we can pass additional info
        info = {self.prob_dying}

        return P, reward, done, info

    def render(self, mode='human'):
        x = self.episodes
        y = []
        for i in range(self.states):
            y[i] = self.prob_dying[i]
            y.append(y[i])
            plt.plot(x, y[i])
            plt.xlabel('Number of episodes')
            plt.ylabel('P(dying) of state')
            plt.title('Learning Process')

        plt.show()

    def close(self):
        pass 

env = StatesEnv(6, 5, 25000)
# If the environment don't follow the interface, an error will be thrown
check_env(env, warn=True)

The error trace is as follows:

/usr/local/lib/python3.6/dist-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-25-b204228c79f7> in <module>()
      1 env = StatesEnv(6, 5, 25000)
      2 # If the environment don't follow the interface, an error will be thrown
----> 3 check_env(env, warn=True)

1 frames
/usr/local/lib/python3.6/dist-packages/stable_baselines/common/env_checker.py in check_env(env, warn, skip_render_check)
    183 
    184     # ============= Check the spaces (observation and action) ================
--> 185     _check_spaces(env)
    186 
    187     # Define aliases for convenience

/usr/local/lib/python3.6/dist-packages/stable_baselines/common/env_checker.py in _check_spaces(env)
    132 
    133     assert isinstance(env.observation_space,
--> 134                       spaces.Space), "The observation space must inherit from gym.spaces" + gym_spaces
    135     assert isinstance(env.action_space, spaces.Space), "The action space must inherit from gym.spaces" + gym_spaces
    136 

AssertionError: The observation space must inherit from gym.spaces cf https://github.com/openai/gym/blob/master/gym/spaces/

System Info Describe the characteristic of your environment:

3.6.9 (default, Apr 18 2020, 01:56:04) 
[GCC 8.4.0]

Additional context I am not entirely sure here, but the problem may stem from the fact that I wanted my observation_space and action_space to be arrays and so converted the Box type by passing them into numpy.array() method. I am not sure if that's the right way to do it, and it'll be great if someone could clarify this as well!

Miffyli commented 4 years ago

No, you should not turn the spaces into Numpy arrays. This ...

        self.observation_space = np.array(spaces.Box(low= np.zeros((s, prop)), high = np.full((s, prop), float('inf')), shape = (s, prop), dtype = np.float32))
        #actions are vectors of the form [n1, n2, n3,...nk, r] for k states and r reserved amount of drug 
        self.action_space = np.array(spaces.Box(low = np.zeros((s+1, ), dtype = int), high = np.array([100]*(s+1)), shape = (s + 1, ), dtype = np.uint8)) 

... should be something like this (I don't know if other stuff is correct)

self.observation_space = spaces.Box(low= np.zeros((s, prop)), high = np.full((s, prop), float('inf')), shape = (s, prop), dtype = np.float32)
self.action_space = spaces.Box(low = np.zeros((s+1, ), dtype = int), high = np.array([100]*(s+1)), shape = (s + 1, ), dtype = np.uint8)

Note that we do not provide tech support outside stable-baselines issues, so if this fixes the error you may close the issue.

KeeratKG commented 4 years ago

@Miffyli would the .Box()method used alone, without passing it intonumpy.array() still allow me to represent my observation_space and action_space as arrays? Because without passing them into numpy.array() I get an error saying that the observation/action_space are unsubscriptable.

Miffyli commented 4 years ago

Yes, spaces.Box alone should work, it should not be wrapped into any arrays or lists.

I see the issue though: You are changing the spaces in reset. You can not change spaces after they are defined in __init__.

KeeratKG commented 4 years ago

@Miffyli the purpose of defining the starting values of the spaces in reset is merely to initialise the value of the spaces from where the exploration-exploitation should start operating and the model begins to train. umm I am not sure if that counts as changing the spaces...?

Miffyli commented 4 years ago

That is changing the spaces, you should not assign anything to observation/action_space after defined initially. reset should return the initial values.

These are questions that are outside stable-baselines and well documented in docs and in OpenAI Gym. I am closing this issue.