[RLlib] (Rollout command not working)

What happened + What you expected to happen

Registering the env: tune.register_env(select_test_env, lambda config: NR_IES_test_v0(test_config))

ray.shutdown()

Rollout Command: rollout_command = f"rllib rollout {best_checkpoint._local_path} --config '{{\"env\": \"NR_IES_test-v0\"}}' --run PPO --no-render --steps 2880"

# execute the command using subprocess
subprocess.run(rollout_command, shell=True)

I'm getting the below error when tried to perform the rollout command to test the model using the checkpoints created in training.

2023-07-08 16:05:19,459 INFO algorithm.py:354 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags. (pid=53288) /home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. (pid=53288) _np_qint8 = np.dtype([("qint8", np.int8, 1)]) (pid=53288) /home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. (pid=53288) _np_quint8 = np.dtype([("quint8", np.uint8, 1)]) (pid=53288) /home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. (pid=53288) _np_qint16 = np.dtype([("qint16", np.int16, 1)]) (pid=53288) /home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. (pid=53288) _np_quint16 = np.dtype([("quint16", np.uint16, 1)]) (pid=53288) /home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. (pid=53288) _np_qint32 = np.dtype([("qint32", np.int32, 1)]) (pid=53288) /home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. (pid=53288) np_resource = np.dtype([("resource", np.ubyte, 1)]) (pid=53288) /home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. (pid=53288) _np_qint8 = np.dtype([("qint8", np.int8, 1)]) (pid=53288) /home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. (pid=53288) _np_quint8 = np.dtype([("quint8", np.uint8, 1)]) (pid=53288) /home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. (pid=53288) _np_qint16 = np.dtype([("qint16", np.int16, 1)]) (pid=53288) /home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. (pid=53288) _np_quint16 = np.dtype([("quint16", np.uint16, 1)]) (pid=53288) /home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. (pid=53288) _np_qint32 = np.dtype([("qint32", np.int32, 1)]) (pid=53288) /home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. (pid=53288) np_resource = np.dtype([("resource", np.ubyte, 1)]) Traceback (most recent call last): File "/home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/ray/rllib/algorithms/algorithm.py", line 425, in setup logdir=self.logdir, File "/home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/ray/rllib/evaluation/worker_set.py", line 127, in init validate=trainer_config.get("validate_workers_after_construction"), File "/home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/ray/rllib/evaluation/worker_set.py", line 269, in add_workers self.foreach_worker(lambda w: w.assert_healthy()) File "/home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/ray/rllib/evaluation/worker_set.py", line 391, in foreach_worker remote_results = ray.get([w.apply.remote(func) for w in self.remote_workers()]) File "/home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper return func(*args, **kwargs) File "/home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/ray/_private/worker.py", line 2277, in get raise value ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::RolloutWorker.init() (pid=53288, ip=131.183.21.110, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f4668c68c50>) KeyError: 'NR_IES_test-v0'

During handling of the above exception, another exception occurred:

ray::RolloutWorker.init() (pid=53288, ip=131.183.21.110, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f4668c68c50>) File "/home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/ray/rllib/env/utils.py", line 50, in _gym_env_creator return gym.make(env_descriptor, env_context) File "/home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/gym/envs/registration.py", line 235, in make return registry.make(id, kwargs) File "/home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/gym/envs/registration.py", line 128, in make spec = self.spec(path) File "/home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/gym/envs/registration.py", line 203, in spec raise error.UnregisteredEnv("No registered env with id: {}".format(id)) gym.error.UnregisteredEnv: No registered env with id: NR_IES_test-v0

During handling of the above exception, another exception occurred:

ray::RolloutWorker.init() (pid=53288, ip=131.183.21.110, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f4668c68c50>) File "/home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 490, in init self.env = env_creator(copy.deepcopy(self.env_context)) File "/home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/ray/rllib/env/utils.py", line 52, in _gym_env_creator raise EnvError(ERR_MSG_INVALID_ENV_DESCRIPTOR.format(env_descriptor)) ray.rllib.utils.error.EnvError: The env string you provided ('NR_IES_test-v0') is: a) Not a supported/installed environment. b) Not a tune-registered environment creator. c) Not a valid env class string.

Try one of the following: a) For Atari support: pip install gym[atari] autorom[accept-rom-license]. For VizDoom support: Install VizDoom (https://github.com/mwydmuch/ViZDoom/blob/master/doc/Building.md) and pip install vizdoomgym. For PyBullet support: pip install pybullet. b) To register your custom env, do from ray import tune; tune.register('[name]', lambda cfg: [return env obj from here using cfg]). Then in your config, do config['env'] = [name]. c) Make sure you provide a fully qualified classpath, e.g.: ray.rllib.examples.env.repeat_after_me_env.RepeatAfterMeEnv

This worked fine in the previous versions of ray. But in this version i'm facing the above error

Versions / Dependencies

Ray version 2.0.0 Python version 3.7.12 conda 23.3.1 Mamba 1.4.2

Reproduction script

from random import random
from secrets import choice
from ray.tune.registry import register_env
import gym
import os
import pickle
import ray
import ray.rllib.agents.ppo as ppo # here we can use DDPG
import shutil
import subprocess
from ray import tune
from ray.tune.suggest.bayesopt import BayesOptSearch
from ray.tune.schedulers.pb2 import PB2
from ray.tune import ExperimentAnalysis

from NR_IES.envs.main.NR_IES_test_env import NR_IES_test_v0

def main ():

    currentPathDirectory = os.path.abspath(os.path.dirname(__file__))

    chkpt_root = currentPathDirectory + "/Checkpoints/raytune/PPO/BayesianSearch_OptimizedStorages"

    ray_results = "{}/ray_results/".format(os.getenv("HOME"))
    shutil.rmtree(ray_results, ignore_errors=True, onerror=None)
    ray.init(ignore_reinit_error= True, local_mode=True)
    select_test_env = "NR_IES_test-v0"

    analysis = ExperimentAnalysis(chkpt_root)

    best_trial = analysis.get_best_logdir(metric = 'episode_reward_mean', mode = 'max')

    best_checkpoint = analysis.get_best_checkpoint(best_trial,metric = 'episode_reward_mean',mode='max')

    best_config = analysis.get_best_config(metric='episode_reward_mean',mode='max')

    best_hes = best_config["env_config"]["hes"]
    best_tes = best_config["env_config"]["tes"]
    best_bes = best_config["env_config"]["bes"]

    test_config = ppo.DEFAULT_CONFIG.copy()
    test_config["log_level"] = "WARN"
    test_config["num_workers"] = 1
    test_config["env"] = select_test_env
    test_config["env_config"] = {
        "hes" : best_hes,
        "tes" : best_tes,
        "bes" : best_bes
    }
    tune.register_env(select_test_env, lambda cfg: NR_IES_test_v0(test_config))

    rollout_command = f"rllib rollout {best_checkpoint._local_path} --config '{{\"env\": \"NR_IES_test-v0\"}}' --run PPO --no-render --steps 2880"

    subprocess.run(rollout_command, shell=True)

    print('best_checkpoint', best_checkpoint._local_path)
    print('best_config: ', best_config)
    print('best_hes: ', best_hes, ', best_tes: ', best_tes, ', best_bes: ', best_bes)

    ray.shutdown()

if __name__ == "__main__":
    main()

Issue Severity

High: It blocks me from completing my task.

Hi @avnishn - Thanks for acknowledging this issue. Here is the simpler reproduction script using one of the standard gym env like cartpole. The above issue still persists in this model also.

Versions: Gym: 0.21.0 Python: 3.7.12 ray: 2.0.0

import gym
import numpy as np
from random import random
from secrets import choice
from ray.tune.registry import register_env
import os
import pickle
import ray
import ray.rllib.agents.ppo as ppo
import shutil
import subprocess
from ray import tune
from ray.tune.search.bayesopt import BayesOptSearch

class CartPoleEnv(gym.Env):
    def __init__(self, config):
        self.hes = config["hes"]
        self.tes = config["tes"]
        self.bes = config["bes"]
        self.env = gym.make('CartPole-v1')
        self.observation_space = self.env.observation_space
        self.action_space = self.env.action_space

    def reset(self):
        return self.env.reset()

    def step(self, action):
        observation, reward, done, info = self.env.step(action)
        return observation, reward, done, info

    def render(self, mode='human'):
        return self.env.render(mode)

    def close(self):
        self.env.close()

def main():
    # initiate directory and save checkpoints
    currentPathDirectory = os.path.abspath(os.path.dirname(__file__))
    chkpt_root = currentPathDirectory + "/Checkpoints/raytune/PPO/Cartpole_OptimizedStorages"
    shutil.rmtree(chkpt_root, ignore_errors=True, onerror=None)

    ray.init(ignore_reinit_error=True)
    select_env = "CartPole-v0"  # Change this to "CartPole-v1" if you want to use the newer version
    select_test_env = "CartPoleTest-v0"

    # Custom training environment registration
    config = ppo.DEFAULT_CONFIG.copy()
    config["log_level"] = "WARN"
    config["num_workers"] = 1
    config["env"] = select_env
    config["env_config"] = {
        "hes": tune.uniform(3600, 36000),
        "tes": tune.uniform(1000, 10000),
        "bes": tune.uniform(30, 100)
    }
    tune.register_env(select_env, lambda config: CartPoleEnv(config))

    bayesopt = BayesOptSearch(metric="episode_reward_mean", mode="max")

    analysis = tune.run(
        "PPO",
        stop={"training_iteration": 1},
        config=config,
        search_alg=bayesopt,
        local_dir="Checkpoints/raytune/PPO/Cartpole_OptimizedStorages",
        checkpoint_score_attr='episode_reward_mean',
        checkpoint_freq=1,
        num_samples=8
    )

    best_trial = analysis.get_best_trial(metric='episode_reward_mean', mode='max', scope='all')
    best_checkpoint = analysis.get_best_checkpoint(best_trial, metric='episode_reward_mean', mode='max')
    best_config = analysis.get_best_config(metric='episode_reward_mean', mode='max')
    best_hes = best_config["env_config"]["hes"]
    best_tes = best_config["env_config"]["tes"]
    best_bes = best_config["env_config"]["bes"]

    # Custom testing environment registration
    test_config = ppo.DEFAULT_CONFIG.copy()
    test_config["log_level"] = "WARN"
    test_config["num_workers"] = 1
    test_config["env"] = select_test_env
    test_config["env_config"] = {
        "hes": best_hes,
        "tes": best_tes,
        "bes": best_bes
    }
    tune.register_env(select_test_env, lambda config: CartPoleEnv(config))

    ray.shutdown()

    # best_checkpoint_path = best_checkpoint[len(currentPathDirectory + "/"):].replace(",","\\,").replace("=","\\=").replace("<","\<").replace(">","\>").replace("'","\\'").replace("class", "class\\")

    # specify the command to run the rollout
    rollout_command = f"rllib rollout {best_checkpoint._local_path} --config '{{\"env\": \"CartPoleTest-v0\"}}' --run PPO --no-render --steps 2880"

    # execute the command using subprocess
    subprocess.run(rollout_command, shell=True)

    print('best_checkpoint', best_checkpoint)
    print('best_config: ', best_config)
    print('best_hes: ', best_hes, ', best_tes: ', best_tes, ', best_bes: ', best_bes)

if __name__ == "__main__":
    main()

I tried with the newer versions of ray (ray 2.5.1) and I'm facing numerous issues in the version compatibility. Can you try to provide a quick fix for this model if possible?

ray-project / ray