Bug Report. gym wrapper returns np._bool causes error when using stable-baselines3

I was trying to connect RLBench with stable-baselines3 and found a minor error. I used the code from #103, and changed state to vision.

import gym
import rlbench.gym
import stable_baselines3.common.env_checker
from stable_baselines3 import PPO

env = gym.make('reach_target-vision-v0')

print(stable_baselines3.common.env_checker.check_env(env))
model = PPO('MlpPolicy', env, verbose=1)
model.learn(total_timesteps=10000)

obs = env.reset()
for i in range(1000):
    action, _states = model.predict(obs, deterministic=True)
    obs, reward, done, info = env.step(action)
    print(obs.shape)
    print(reward)
    if done:
      obs = env.reset()

env.close()

When I run this code, such error occured from stable-baselines3 env checker:

  assert isinstance(done, bool), "The `done` signal must be a boolean"
AssertionError: The `done` signal must be a boolean

This was due to the incompatibility bewteen np._bool and bool. In rlbench/gym/rlbench.env.py line107, terminate is a np._bool type, which makes isinstance(done, bool) False. To fix it, I simply typecasted and it works:

def step(self, action) -> Tuple[Dict[str, np.ndarray], float, bool, dict]: 
        obs, reward, terminate = self.task.step(action) 
        terminate = bool(terminate) 
        return self._extract_obs(obs), reward, terminate, {}

stepjam / RLBench

Bug Report. gym wrapper returns np._bool causes error when using stable-baselines3 #174