hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.1k stars 727 forks source link

Action Type Check in Env [feature request] #707

Open MentalGear opened 4 years ago

MentalGear commented 4 years ago

Describe the bug TypeError: 'int' object is not subscriptable

Ran a couple time into this bug, and found it really hard to debug within stable baselines. To spare others (as well as my future self) much frustration I would suggest to add a type check to wrapper envs like dummyEnv (or only check_env?) before using the action to make sure it's an iterable, before continuing with the code.

The following example assertion will give a developer friendly error message that's easy to understand and offers an immediate solution, saving much frustration.

Code example

obs, reward, done, info = env.step(action)

https://github.com/hill-a/stable-baselines/issues/40#issuecomment-425743048

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/stable_baselines/common/vec_env/base_vec_env.py", line 134, in step
    return self.step_wait()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/stable_baselines/common/vec_env/vec_check_nan.py", line 35, in step_wait
    observations, rewards, news, infos = self.venv.step_wait()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/stable_baselines/common/vec_env/dummy_vec_env.py", line 41, in step_wait
    self.envs[env_idx].step(self.actions[env_idx])

Proposed Solution

self.actions = actions
try:
    iterator = iter(self.actions)
except TypeError:
    # not iterable
   raise Exception( "Action must be of type iterable. Try wrapping your action variable in a list [ ] to fix this issue." )

for https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/common/vec_env/dummy_vec_env.py#L37

araffin commented 4 years ago

Hello,

Do you have a minimal example (minimal custom env) where that error happens?

It seems that you are using custom action generation (i.e. not using .train(), nor .predict() to select the action).

But I agree we can add a check in the VecEnv (but using isinstance(actions, (list, np.ndarray))) and also checking that it corresponds to the number of environments. Feel free to submit a PR that solves that issue ;)

MentalGear commented 4 years ago

It happens when I pass a single action, without wrapping it in a list. CartPole example where the possible actions are LEFT or RIGHT (one var) is an example. The issue/code described by lhorus though is of the same problem.

OK for isinstance() method if list & np.ndarray are the only 2 iterables that are acceptable. I can implement it here.

However, I'm not sure on how to check for 'that it corresponds to the number of environments'?

stefanbschneider commented 4 years ago

However, I'm not sure on how to check for 'that it corresponds to the number of environments'?

I guess that's just sth like

if len(self.actions) != len(self.envs):
    raise ValueError("The number of actions needs to be equal to the number of environments in a VecEnv")
araffin commented 4 years ago

I guess that's just sth like

This works but we need to make an exception for the _UnVecWrapper.... cf PR