hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.13k stars 723 forks source link

TypeError: object of type 'int' has no len() #977

Closed KeeratKG closed 3 years ago

KeeratKG commented 4 years ago

Hi! I have made an environment for drug distribution amongst a few locations following the openai library, and have checked it successfully using check_env(env, warn=True).

Describe the bug On deploying the ACKTR algorithm on my environment, I keep getting the error TypeError: object of type 'int' has no len() at the end of the training process due to which the model is unable to be trained finally and then further, tested.

Code example The following code should help you to reproduce the error:


!pip install stable-baselines[mpi]==2.10.0

from stable_baselines.common.env_checker import check_env

import numpy as np 
import gym 
from gym import spaces 
import matplotlib.pyplot as plt 
import math
import random

class StatesEnv(gym.Env):
    """
    Customised Environment that follows gym interface.
    Describes relevant properties of the state and action spaces. 
    """
    metadata = {'render.modes':['human']}

    def __init__(self, s, episodes, total):

        self.states = s #no of independent simulations to be run 
        low = np.zeros((5,5))
        high = np.array([np.inf, 1, 1, np.inf, np.inf]*5).reshape((5,5))
        self.observation_space = spaces.Box(low, high, shape=(5, 5), dtype = np.float)
        #actions are vectors of the form [n1, n2, n3,...nk] for k states 
        self.action_space = spaces.Box(low = np.zeros((s, ), dtype = int), high = np.array([100]*(s)), shape = (s, ), dtype = np.float)

        self.curr_step = 0
        self.done = False
        self.valueMap = np.zeros((self.states, 100))
        self.total = total #total number of vials available in 1 batch = batch size 
        self.episodes = episodes
        self.received = [0]*self.states
        self.states_cond = []
        self.action_list = []
        self.gamma = 0.20
        self.epsilon = 0.2
        self.susc = [0]*self.states
        #self.dead = [0]*self.states  #includes only those individuals who die due to inefficacy of vaccine 

    def get_discrete_int(self, n):
        discrete_int = int(n)
        return discrete_int

    def reset(self):

        self.curr_step = 0
        self.done = False
        self.total = 10000
        # Declare the Initial Conditions for the States

        self.states_cond =  np.array([(80188, 0.031900034917943, 0.614817678455629, 16707753, 16787941), 
                              (159133, 0.045703908051756, 0.529399935902673, 112215200, 112374333),
                              (6816, 0.001320422535211, 0.660211267605634, 31198760, 31205576),
                              (387, 0, 0.423772609819121, 1978115, 1978502), 
                              (2339, 0.005130397605814, 0.737067122702009, 32985795, 32988134)])
                               # Confirmed DR RR Susc Population 
                               # Delhi, Maha, Assam, Naga, Jharkhand  
        #store the actions in an array 
        self.action_list = np.array([100/(self.states)]*(self.states))

        return self.states_cond

    def step(self, action):

        # check if we're done
        if self.curr_step >= self.episodes - 1:
            self.done = True
        print("Are we done?", self.done)

        if self.states_cond is None:
            raise Exception("You need to reset() the environment before calling step()!")
        else:
            print('Observation Space for this episode is: ', self.states_cond)

        #start with equal distribution 
        if self.curr_step == 1:
            self.action_list = np.array([100/(self.states)]*(self.states))

        #update action_list to store only the most recently used action values 
        self.action_list = action
        print("Distribution set: ",self.action_list)

        #no of units distrbuted to respective states              
        for i in range(self.states):
            self.received[i] = self.total*self.action_list[i]/100

        #simulation
        for i in range(self.states):
            self.susc[i] = self.states_cond[i, 3]-self.get_discrete_int(self.received[i])  #new count of susc people
        print("New Count of Susceptible people: ", self.susc)
        self.states_cond = np.array(self.states_cond)
        self.states_cond[:, 3] = self.susc                            #update values in states_cond matrix 

        #reward only when task done 
        reward = self.get_reward()
        print("Reward: ", reward)

        # increment episode
        self.curr_step += 1

        return self.states_cond, reward, self.done, {'action_list': self.action_list, 'episode': self.curr_step}

    def get_reward(self):
        reward = [0]*self.states
        for i in range(self.states):          
            reward[i] = self.states_cond[i, 3]*math.exp(-self.received[i]/self.states_cond[i, 1])
        reward = sum(reward)
        return reward 

    def close(self):
        pass 

from stable_baselines import DQN, PPO2, A2C, ACKTR
from stable_baselines.common.cmd_util import make_vec_env

# Instantiate the env
env = StatesEnv(5,10,10000)
# wrap it
env = make_vec_env(lambda: env, n_envs=1)

# Train the agent
model = ACKTR('MlpPolicy', env, verbose=1).learn(5000)
model.save("acktr_")
model = ACKTR.load("acktr_")

This is the output and the error I get:

Observation Space for this episode is:  [[8.01880000e+04 3.19000349e-02 6.14817678e-01 1.67077530e+07
  1.67879410e+07]
 [1.59133000e+05 4.57039081e-02 5.29399936e-01 1.12215200e+08
  1.12374333e+08]
 [6.81600000e+03 1.32042254e-03 6.60211268e-01 3.11987600e+07
  3.12055760e+07]
 [3.87000000e+02 0.00000000e+00 4.23772610e-01 1.97811500e+06
  1.97850200e+06]
 [2.33900000e+03 5.13039761e-03 7.37067123e-01 3.29857950e+07
  3.29881340e+07]]
Distribution set:  [0.24182837 0.         0.90185279 1.29398715 1.42746699]
New Count of Susceptible people:  [16707729.0, 112215200.0, 31198670.0, 1977986.0, 32985653.0]
Reward:  112215200.0
Are we done? False
Observation Space for this episode is:  [[8.01880000e+04 3.19000349e-02 6.14817678e-01 1.67077290e+07
  1.67879410e+07]
 [1.59133000e+05 4.57039081e-02 5.29399936e-01 1.12215200e+08
  1.12374333e+08]
 [6.81600000e+03 1.32042254e-03 6.60211268e-01 3.11986700e+07
  3.12055760e+07]
 [3.87000000e+02 0.00000000e+00 4.23772610e-01 1.97798600e+06
  1.97850200e+06]
 [2.33900000e+03 5.13039761e-03 7.37067123e-01 3.29856530e+07
  3.29881340e+07]]
Distribution set:  [0.34365147 0.         1.05715179 0.78844482 0.10514075]
New Count of Susceptible people:  [16707695.0, 112215200.0, 31198565.0, 1977908.0, 32985643.0]
Reward:  112215200.0
Are we done? False
Observation Space for this episode is:  [[8.01880000e+04 3.19000349e-02 6.14817678e-01 1.67076950e+07
  1.67879410e+07]
 [1.59133000e+05 4.57039081e-02 5.29399936e-01 1.12215200e+08
  1.12374333e+08]
 [6.81600000e+03 1.32042254e-03 6.60211268e-01 3.11985650e+07
  3.12055760e+07]
 [3.87000000e+02 0.00000000e+00 4.23772610e-01 1.97790800e+06
  1.97850200e+06]
 [2.33900000e+03 5.13039761e-03 7.37067123e-01 3.29856430e+07
  3.29881340e+07]]
Distribution set:  [0.         0.         0.         0.69746369 0.        ]
New Count of Susceptible people:  [16707695.0, 112215200.0, 31198565.0, 1977839.0, 32985643.0]
Reward:  193107103.0
Are we done? False
Observation Space for this episode is:  [[8.01880000e+04 3.19000349e-02 6.14817678e-01 1.67076950e+07
  1.67879410e+07]
 [1.59133000e+05 4.57039081e-02 5.29399936e-01 1.12215200e+08
  1.12374333e+08]
 [6.81600000e+03 1.32042254e-03 6.60211268e-01 3.11985650e+07
  3.12055760e+07]
 [3.87000000e+02 0.00000000e+00 4.23772610e-01 1.97783900e+06
  1.97850200e+06]
 [2.33900000e+03 5.13039761e-03 7.37067123e-01 3.29856430e+07
  3.29881340e+07]]
Distribution set:  [0.32131773 0.         1.80711377 0.         1.06970489]
New Count of Susceptible people:  [16707663.0, 112215200.0, 31198385.0, 1977839.0, 32985537.0]
Reward:  nan
Are we done? False
Observation Space for this episode is:  [[8.01880000e+04 3.19000349e-02 6.14817678e-01 1.67076630e+07
  1.67879410e+07]
 [1.59133000e+05 4.57039081e-02 5.29399936e-01 1.12215200e+08
  1.12374333e+08]
 [6.81600000e+03 1.32042254e-03 6.60211268e-01 3.11983850e+07
  3.12055760e+07]
 [3.87000000e+02 0.00000000e+00 4.23772610e-01 1.97783900e+06
  1.97850200e+06]
 [2.33900000e+03 5.13039761e-03 7.37067123e-01 3.29855370e+07
  3.29881340e+07]]
Distribution set:  [0.82852572 2.98951459 0.         0.         0.12378889]
New Count of Susceptible people:  [16707581.0, 112214902.0, 31198385.0, 1977839.0, 32985525.0]
Reward:  nan
Are we done? False
Observation Space for this episode is:  [[8.01880000e+04 3.19000349e-02 6.14817678e-01 1.67075810e+07
  1.67879410e+07]
 [1.59133000e+05 4.57039081e-02 5.29399936e-01 1.12214902e+08
  1.12374333e+08]
 [6.81600000e+03 1.32042254e-03 6.60211268e-01 3.11983850e+07
  3.12055760e+07]
 [3.87000000e+02 0.00000000e+00 4.23772610e-01 1.97783900e+06
  1.97850200e+06]
 [2.33900000e+03 5.13039761e-03 7.37067123e-01 3.29855250e+07
  3.29881340e+07]]
Distribution set:  [1.34342873 0.17551899 0.21092209 0.         0.86918694]
New Count of Susceptible people:  [16707447.0, 112214885.0, 31198364.0, 1977839.0, 32985439.0]
Reward:  nan
Are we done? False
Observation Space for this episode is:  [[8.01880000e+04 3.19000349e-02 6.14817678e-01 1.67074470e+07
  1.67879410e+07]
 [1.59133000e+05 4.57039081e-02 5.29399936e-01 1.12214885e+08
  1.12374333e+08]
 [6.81600000e+03 1.32042254e-03 6.60211268e-01 3.11983640e+07
  3.12055760e+07]
 [3.87000000e+02 0.00000000e+00 4.23772610e-01 1.97783900e+06
  1.97850200e+06]
 [2.33900000e+03 5.13039761e-03 7.37067123e-01 3.29854390e+07
  3.29881340e+07]]
Distribution set:  [0.         0.24685273 1.1099968  1.35033464 2.01418495]
New Count of Susceptible people:  [16707447.0, 112214861.0, 31198254.0, 1977704.0, 32985238.0]
Reward:  16707447.0
Are we done? False
Observation Space for this episode is:  [[8.01880000e+04 3.19000349e-02 6.14817678e-01 1.67074470e+07
  1.67879410e+07]
 [1.59133000e+05 4.57039081e-02 5.29399936e-01 1.12214861e+08
  1.12374333e+08]
 [6.81600000e+03 1.32042254e-03 6.60211268e-01 3.11982540e+07
  3.12055760e+07]
 [3.87000000e+02 0.00000000e+00 4.23772610e-01 1.97770400e+06
  1.97850200e+06]
 [2.33900000e+03 5.13039761e-03 7.37067123e-01 3.29852380e+07
  3.29881340e+07]]
Distribution set:  [0.36120349 0.06322803 0.07712679 0.         1.0988667 ]
New Count of Susceptible people:  [16707411.0, 112214855.0, 31198247.0, 1977704.0, 32985129.0]
Reward:  nan
Are we done? False
Observation Space for this episode is:  [[8.01880000e+04 3.19000349e-02 6.14817678e-01 1.67074110e+07
  1.67879410e+07]
 [1.59133000e+05 4.57039081e-02 5.29399936e-01 1.12214855e+08
  1.12374333e+08]
 [6.81600000e+03 1.32042254e-03 6.60211268e-01 3.11982470e+07
  3.12055760e+07]
 [3.87000000e+02 0.00000000e+00 4.23772610e-01 1.97770400e+06
  1.97850200e+06]
 [2.33900000e+03 5.13039761e-03 7.37067123e-01 3.29851290e+07
  3.29881340e+07]]
Distribution set:  [0.80808419 0.         1.07677376 0.60902292 0.51397026]
New Count of Susceptible people:  [16707331.0, 112214855.0, 31198140.0, 1977644.0, 32985078.0]
Reward:  112214855.0
Are we done? True
Observation Space for this episode is:  [[8.01880000e+04 3.19000349e-02 6.14817678e-01 1.67073310e+07
  1.67879410e+07]
 [1.59133000e+05 4.57039081e-02 5.29399936e-01 1.12214855e+08
  1.12374333e+08]
 [6.81600000e+03 1.32042254e-03 6.60211268e-01 3.11981400e+07
  3.12055760e+07]
 [3.87000000e+02 0.00000000e+00 4.23772610e-01 1.97764400e+06
  1.97850200e+06]
 [2.33900000e+03 5.13039761e-03 7.37067123e-01 3.29850780e+07
  3.29881340e+07]]
Distribution set:  [0. 0. 0. 0. 0.]
New Count of Susceptible people:  [16707331.0, 112214855.0, 31198140.0, 1977644.0, 32985078.0]
Reward:  nan
Are we done? False
Observation Space for this episode is:  [[8.01880000e+04 3.19000349e-02 6.14817678e-01 1.67077530e+07
  1.67879410e+07]
 [1.59133000e+05 4.57039081e-02 5.29399936e-01 1.12215200e+08
  1.12374333e+08]
 [6.81600000e+03 1.32042254e-03 6.60211268e-01 3.11987600e+07
  3.12055760e+07]
 [3.87000000e+02 0.00000000e+00 4.23772610e-01 1.97811500e+06
  1.97850200e+06]
 [2.33900000e+03 5.13039761e-03 7.37067123e-01 3.29857950e+07
  3.29881340e+07]]
Distribution set:  [1.47092974 1.98993182 1.16552019 0.         0.118194  ]
New Count of Susceptible people:  [16707606.0, 112215002.0, 31198644.0, 1978115.0, 32985784.0]
Reward:  nan
Are we done? False
Observation Space for this episode is:  [[8.01880000e+04 3.19000349e-02 6.14817678e-01 1.67076060e+07
  1.67879410e+07]
 [1.59133000e+05 4.57039081e-02 5.29399936e-01 1.12215002e+08
  1.12374333e+08]
 [6.81600000e+03 1.32042254e-03 6.60211268e-01 3.11986440e+07
  3.12055760e+07]
 [3.87000000e+02 0.00000000e+00 4.23772610e-01 1.97811500e+06
  1.97850200e+06]
 [2.33900000e+03 5.13039761e-03 7.37067123e-01 3.29857840e+07
  3.29881340e+07]]
Distribution set:  [0.         1.14528465 0.         0.         0.18599738]
New Count of Susceptible people:  [16707606.0, 112214888.0, 31198644.0, 1978115.0, 32985766.0]
Reward:  nan
Are we done? False
Observation Space for this episode is:  [[8.01880000e+04 3.19000349e-02 6.14817678e-01 1.67076060e+07
  1.67879410e+07]
 [1.59133000e+05 4.57039081e-02 5.29399936e-01 1.12214888e+08
  1.12374333e+08]
 [6.81600000e+03 1.32042254e-03 6.60211268e-01 3.11986440e+07
  3.12055760e+07]
 [3.87000000e+02 0.00000000e+00 4.23772610e-01 1.97811500e+06
  1.97850200e+06]
 [2.33900000e+03 5.13039761e-03 7.37067123e-01 3.29857660e+07
  3.29881340e+07]]
Distribution set:  [0.         0.64509821 0.         0.         1.21263659]
New Count of Susceptible people:  [16707606.0, 112214824.0, 31198644.0, 1978115.0, 32985645.0]
Reward:  nan
Are we done? False
Observation Space for this episode is:  [[8.01880000e+04 3.19000349e-02 6.14817678e-01 1.67076060e+07
  1.67879410e+07]
 [1.59133000e+05 4.57039081e-02 5.29399936e-01 1.12214824e+08
  1.12374333e+08]
 [6.81600000e+03 1.32042254e-03 6.60211268e-01 3.11986440e+07
  3.12055760e+07]
 [3.87000000e+02 0.00000000e+00 4.23772610e-01 1.97811500e+06
  1.97850200e+06]
 [2.33900000e+03 5.13039761e-03 7.37067123e-01 3.29856450e+07
  3.29881340e+07]]
Distribution set:  [0.         0.66946673 0.         0.         0.6838572 ]
New Count of Susceptible people:  [16707606.0, 112214758.0, 31198644.0, 1978115.0, 32985577.0]
Reward:  nan
Are we done? False
Observation Space for this episode is:  [[8.01880000e+04 3.19000349e-02 6.14817678e-01 1.67076060e+07
  1.67879410e+07]
 [1.59133000e+05 4.57039081e-02 5.29399936e-01 1.12214758e+08
  1.12374333e+08]
 [6.81600000e+03 1.32042254e-03 6.60211268e-01 3.11986440e+07
  3.12055760e+07]
 [3.87000000e+02 0.00000000e+00 4.23772610e-01 1.97811500e+06
  1.97850200e+06]
 [2.33900000e+03 5.13039761e-03 7.37067123e-01 3.29855770e+07
  3.29881340e+07]]
Distribution set:  [0.         0.97177422 0.19672939 0.52811939 0.        ]
New Count of Susceptible people:  [16707606.0, 112214661.0, 31198625.0, 1978063.0, 32985577.0]
Reward:  49693183.0
Are we done? False
Observation Space for this episode is:  [[8.01880000e+04 3.19000349e-02 6.14817678e-01 1.67076060e+07
  1.67879410e+07]
 [1.59133000e+05 4.57039081e-02 5.29399936e-01 1.12214661e+08
  1.12374333e+08]
 [6.81600000e+03 1.32042254e-03 6.60211268e-01 3.11986250e+07
  3.12055760e+07]
 [3.87000000e+02 0.00000000e+00 4.23772610e-01 1.97806300e+06
  1.97850200e+06]
 [2.33900000e+03 5.13039761e-03 7.37067123e-01 3.29855770e+07
  3.29881340e+07]]
Distribution set:  [1.22375858 0.89565408 0.99770182 0.62687373 1.08607018]
New Count of Susceptible people:  [16707484.0, 112214572.0, 31198526.0, 1978001.0, 32985469.0]
Reward:  0.0
Are we done? False
Observation Space for this episode is:  [[8.01880000e+04 3.19000349e-02 6.14817678e-01 1.67074840e+07
  1.67879410e+07]
 [1.59133000e+05 4.57039081e-02 5.29399936e-01 1.12214572e+08
  1.12374333e+08]
 [6.81600000e+03 1.32042254e-03 6.60211268e-01 3.11985260e+07
  3.12055760e+07]
 [3.87000000e+02 0.00000000e+00 4.23772610e-01 1.97800100e+06
  1.97850200e+06]
 [2.33900000e+03 5.13039761e-03 7.37067123e-01 3.29854690e+07
  3.29881340e+07]]
Distribution set:  [1.18877232 1.02862406 0.07613723 0.         0.        ]
New Count of Susceptible people:  [16707366.0, 112214470.0, 31198519.0, 1978001.0, 32985469.0]
Reward:  nan
Are we done? False
Observation Space for this episode is:  [[8.01880000e+04 3.19000349e-02 6.14817678e-01 1.67073660e+07
  1.67879410e+07]
 [1.59133000e+05 4.57039081e-02 5.29399936e-01 1.12214470e+08
  1.12374333e+08]
 [6.81600000e+03 1.32042254e-03 6.60211268e-01 3.11985190e+07
  3.12055760e+07]
 [3.87000000e+02 0.00000000e+00 4.23772610e-01 1.97800100e+06
  1.97850200e+06]
 [2.33900000e+03 5.13039761e-03 7.37067123e-01 3.29854690e+07
  3.29881340e+07]]
Distribution set:  [1.26285255 0.         0.87876743 0.14975204 0.        ]
New Count of Susceptible people:  [16707240.0, 112214470.0, 31198432.0, 1977987.0, 32985469.0]
Reward:  145199939.0
Are we done? False
Observation Space for this episode is:  [[8.01880000e+04 3.19000349e-02 6.14817678e-01 1.67072400e+07
  1.67879410e+07]
 [1.59133000e+05 4.57039081e-02 5.29399936e-01 1.12214470e+08
  1.12374333e+08]
 [6.81600000e+03 1.32042254e-03 6.60211268e-01 3.11984320e+07
  3.12055760e+07]
 [3.87000000e+02 0.00000000e+00 4.23772610e-01 1.97798700e+06
  1.97850200e+06]
 [2.33900000e+03 5.13039761e-03 7.37067123e-01 3.29854690e+07
  3.29881340e+07]]
Distribution set:  [1.949821   0.02370276 0.22049969 0.70631927 0.        ]
New Count of Susceptible people:  [16707046.0, 112214468.0, 31198410.0, 1977917.0, 32985469.0]
Reward:  32985469.0
Are we done? True
Observation Space for this episode is:  [[8.01880000e+04 3.19000349e-02 6.14817678e-01 1.67070460e+07
  1.67879410e+07]
 [1.59133000e+05 4.57039081e-02 5.29399936e-01 1.12214468e+08
  1.12374333e+08]
 [6.81600000e+03 1.32042254e-03 6.60211268e-01 3.11984100e+07
  3.12055760e+07]
 [3.87000000e+02 0.00000000e+00 4.23772610e-01 1.97791700e+06
  1.97850200e+06]
 [2.33900000e+03 5.13039761e-03 7.37067123e-01 3.29854690e+07
  3.29881340e+07]]
Distribution set:  [0.         0.         0.         0.         0.10276788]
New Count of Susceptible people:  [16707046.0, 112214468.0, 31198410.0, 1977917.0, 32985459.0]
Reward:  nan
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:153: RuntimeWarning: divide by zero encountered in double_scalars
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:153: RuntimeWarning: invalid value encountered in double_scalars
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-43-8e4deb95a198> in <module>()
      1 # Train the agent
----> 2 model = ACKTR('MlpPolicy', env, verbose=1).learn(5000)
      3 model.save("acktr_")
      4 model = ACKTR.load("acktr_")

/usr/local/lib/python3.6/dist-packages/stable_baselines/acktr/acktr.py in learn(self, total_timesteps, callback, log_interval, tb_log_name, reset_num_timesteps)
    375                     logger.record_tabular("value_loss", float(value_loss))
    376                     logger.record_tabular("explained_variance", float(explained_var))
--> 377                     if len(self.ep_info_buf) > 0 and len(self.ep_info_buf[0]) > 0:
    378                         logger.logkv('ep_reward_mean', safe_mean([ep_info['r'] for ep_info in self.ep_info_buf]))
    379                         logger.logkv('ep_len_mean', safe_mean([ep_info['l'] for ep_info in self.ep_info_buf]))

TypeError: object of type 'int' has no len()

System Info

How can I get around this? I am unable to understand what part of my code is messing with the self.ep_info_buf in acktr.py.

Thanks.

Miffyli commented 4 years ago

This is because of the key "episode" in info dictionary returned by the environment. This key is used to track episode lengths and rewards. Try changing that key to something else and things should work.

Marking as a bug because code should warn about this. We also need to check if SB3 has the same issue.

KeeratKG commented 4 years ago

Yes, I changed the key "episode" to "episode_number" and it worked.

Thanks!

matthiasmfr commented 3 months ago

This is because of the key "episode" in info dictionary returned by the environment. This key is used to track episode lengths and rewards. Try changing that key to something else and things should work.

Marking as a bug because code should warn about this. We also need to check if SB3 has the same issue.

this is still not fixed in SB3 btw