ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.04k stars 5.59k forks source link

[RLlib] (Rollout command not working) #37224

Open kmattim5 opened 1 year ago

kmattim5 commented 1 year ago

What happened + What you expected to happen

Registering the env: tune.register_env(select_test_env, lambda config: NR_IES_test_v0(test_config))

ray.shutdown()

Rollout Command: rollout_command = f"rllib rollout {best_checkpoint._local_path} --config '{{\"env\": \"NR_IES_test-v0\"}}' --run PPO --no-render --steps 2880"

# execute the command using subprocess
subprocess.run(rollout_command, shell=True)

I'm getting the below error when tried to perform the rollout command to test the model using the checkpoints created in training.

2023-07-08 16:05:19,459 INFO algorithm.py:354 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags. (pid=53288) /home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. (pid=53288) _np_qint8 = np.dtype([("qint8", np.int8, 1)]) (pid=53288) /home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. (pid=53288) _np_quint8 = np.dtype([("quint8", np.uint8, 1)]) (pid=53288) /home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. (pid=53288) _np_qint16 = np.dtype([("qint16", np.int16, 1)]) (pid=53288) /home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. (pid=53288) _np_quint16 = np.dtype([("quint16", np.uint16, 1)]) (pid=53288) /home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. (pid=53288) _np_qint32 = np.dtype([("qint32", np.int32, 1)]) (pid=53288) /home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. (pid=53288) np_resource = np.dtype([("resource", np.ubyte, 1)]) (pid=53288) /home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. (pid=53288) _np_qint8 = np.dtype([("qint8", np.int8, 1)]) (pid=53288) /home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. (pid=53288) _np_quint8 = np.dtype([("quint8", np.uint8, 1)]) (pid=53288) /home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. (pid=53288) _np_qint16 = np.dtype([("qint16", np.int16, 1)]) (pid=53288) /home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. (pid=53288) _np_quint16 = np.dtype([("quint16", np.uint16, 1)]) (pid=53288) /home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. (pid=53288) _np_qint32 = np.dtype([("qint32", np.int32, 1)]) (pid=53288) /home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. (pid=53288) np_resource = np.dtype([("resource", np.ubyte, 1)]) Traceback (most recent call last): File "/home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/ray/rllib/algorithms/algorithm.py", line 425, in setup logdir=self.logdir, File "/home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/ray/rllib/evaluation/worker_set.py", line 127, in init validate=trainer_config.get("validate_workers_after_construction"), File "/home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/ray/rllib/evaluation/worker_set.py", line 269, in add_workers self.foreach_worker(lambda w: w.assert_healthy()) File "/home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/ray/rllib/evaluation/worker_set.py", line 391, in foreach_worker remote_results = ray.get([w.apply.remote(func) for w in self.remote_workers()]) File "/home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper return func(*args, **kwargs) File "/home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/ray/_private/worker.py", line 2277, in get raise value ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::RolloutWorker.init() (pid=53288, ip=131.183.21.110, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f4668c68c50>) KeyError: 'NR_IES_test-v0'

During handling of the above exception, another exception occurred:

ray::RolloutWorker.init() (pid=53288, ip=131.183.21.110, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f4668c68c50>) File "/home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/ray/rllib/env/utils.py", line 50, in _gym_env_creator return gym.make(env_descriptor, env_context) File "/home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/gym/envs/registration.py", line 235, in make return registry.make(id, kwargs) File "/home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/gym/envs/registration.py", line 128, in make spec = self.spec(path) File "/home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/gym/envs/registration.py", line 203, in spec raise error.UnregisteredEnv("No registered env with id: {}".format(id)) gym.error.UnregisteredEnv: No registered env with id: NR_IES_test-v0

During handling of the above exception, another exception occurred:

ray::RolloutWorker.init() (pid=53288, ip=131.183.21.110, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f4668c68c50>) File "/home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 490, in init self.env = env_creator(copy.deepcopy(self.env_context)) File "/home/drl-tperg/miniconda3/envs/rl/lib/python3.7/site-packages/ray/rllib/env/utils.py", line 52, in _gym_env_creator raise EnvError(ERR_MSG_INVALID_ENV_DESCRIPTOR.format(env_descriptor)) ray.rllib.utils.error.EnvError: The env string you provided ('NR_IES_test-v0') is: a) Not a supported/installed environment. b) Not a tune-registered environment creator. c) Not a valid env class string.

Try one of the following: a) For Atari support: pip install gym[atari] autorom[accept-rom-license]. For VizDoom support: Install VizDoom (https://github.com/mwydmuch/ViZDoom/blob/master/doc/Building.md) and pip install vizdoomgym. For PyBullet support: pip install pybullet. b) To register your custom env, do from ray import tune; tune.register('[name]', lambda cfg: [return env obj from here using cfg]). Then in your config, do config['env'] = [name]. c) Make sure you provide a fully qualified classpath, e.g.: ray.rllib.examples.env.repeat_after_me_env.RepeatAfterMeEnv

This worked fine in the previous versions of ray. But in this version i'm facing the above error

Versions / Dependencies

Ray version 2.0.0 Python version 3.7.12 conda 23.3.1 Mamba 1.4.2

Reproduction script

from random import random
from secrets import choice
from ray.tune.registry import register_env
import gym
import os
import pickle
import ray
import ray.rllib.agents.ppo as ppo # here we can use DDPG
import shutil
import subprocess
from ray import tune
from ray.tune.suggest.bayesopt import BayesOptSearch
from ray.tune.schedulers.pb2 import PB2
from ray.tune import ExperimentAnalysis

from NR_IES.envs.main.NR_IES_test_env import NR_IES_test_v0

def main ():

    currentPathDirectory = os.path.abspath(os.path.dirname(__file__))

    chkpt_root = currentPathDirectory + "/Checkpoints/raytune/PPO/BayesianSearch_OptimizedStorages"

    ray_results = "{}/ray_results/".format(os.getenv("HOME"))
    shutil.rmtree(ray_results, ignore_errors=True, onerror=None)
    ray.init(ignore_reinit_error= True, local_mode=True)
    select_test_env = "NR_IES_test-v0"

    analysis = ExperimentAnalysis(chkpt_root)

    best_trial = analysis.get_best_logdir(metric = 'episode_reward_mean', mode = 'max')

    best_checkpoint = analysis.get_best_checkpoint(best_trial,metric = 'episode_reward_mean',mode='max')

    best_config = analysis.get_best_config(metric='episode_reward_mean',mode='max')

    best_hes = best_config["env_config"]["hes"]
    best_tes = best_config["env_config"]["tes"]
    best_bes = best_config["env_config"]["bes"]

    test_config = ppo.DEFAULT_CONFIG.copy()
    test_config["log_level"] = "WARN"
    test_config["num_workers"] = 1
    test_config["env"] = select_test_env
    test_config["env_config"] = {
        "hes" : best_hes,
        "tes" : best_tes,
        "bes" : best_bes
    }
    tune.register_env(select_test_env, lambda cfg: NR_IES_test_v0(test_config))

    rollout_command = f"rllib rollout {best_checkpoint._local_path} --config '{{\"env\": \"NR_IES_test-v0\"}}' --run PPO --no-render --steps 2880"

    subprocess.run(rollout_command, shell=True)

    print('best_checkpoint', best_checkpoint._local_path)
    print('best_config: ', best_config)
    print('best_hes: ', best_hes, ', best_tes: ', best_tes, ', best_bes: ', best_bes)

    ray.shutdown()

if __name__ == "__main__":
    main()

Issue Severity

High: It blocks me from completing my task.

avnishn commented 1 year ago

@kmattim5 thanks for reporting this. Could you possibly give us a simpler reproduction script, using one of the standard gym environments like cartpole?

can you also check to see if the issue persists on newer versions of ray (we are now on ray 2.5.1) as we do not provide backports.

kmattim5 commented 1 year ago

Hi @avnishn - Thanks for acknowledging this issue. Here is the simpler reproduction script using one of the standard gym env like cartpole. The above issue still persists in this model also.

Versions: Gym: 0.21.0 Python: 3.7.12 ray: 2.0.0

import gym
import numpy as np
from random import random
from secrets import choice
from ray.tune.registry import register_env
import os
import pickle
import ray
import ray.rllib.agents.ppo as ppo
import shutil
import subprocess
from ray import tune
from ray.tune.search.bayesopt import BayesOptSearch

class CartPoleEnv(gym.Env):
    def __init__(self, config):
        self.hes = config["hes"]
        self.tes = config["tes"]
        self.bes = config["bes"]
        self.env = gym.make('CartPole-v1')
        self.observation_space = self.env.observation_space
        self.action_space = self.env.action_space

    def reset(self):
        return self.env.reset()

    def step(self, action):
        observation, reward, done, info = self.env.step(action)
        return observation, reward, done, info

    def render(self, mode='human'):
        return self.env.render(mode)

    def close(self):
        self.env.close()

def main():
    # initiate directory and save checkpoints
    currentPathDirectory = os.path.abspath(os.path.dirname(__file__))
    chkpt_root = currentPathDirectory + "/Checkpoints/raytune/PPO/Cartpole_OptimizedStorages"
    shutil.rmtree(chkpt_root, ignore_errors=True, onerror=None)

    ray.init(ignore_reinit_error=True)
    select_env = "CartPole-v0"  # Change this to "CartPole-v1" if you want to use the newer version
    select_test_env = "CartPoleTest-v0"

    # Custom training environment registration
    config = ppo.DEFAULT_CONFIG.copy()
    config["log_level"] = "WARN"
    config["num_workers"] = 1
    config["env"] = select_env
    config["env_config"] = {
        "hes": tune.uniform(3600, 36000),
        "tes": tune.uniform(1000, 10000),
        "bes": tune.uniform(30, 100)
    }
    tune.register_env(select_env, lambda config: CartPoleEnv(config))

    bayesopt = BayesOptSearch(metric="episode_reward_mean", mode="max")

    analysis = tune.run(
        "PPO",
        stop={"training_iteration": 1},
        config=config,
        search_alg=bayesopt,
        local_dir="Checkpoints/raytune/PPO/Cartpole_OptimizedStorages",
        checkpoint_score_attr='episode_reward_mean',
        checkpoint_freq=1,
        num_samples=8
    )

    best_trial = analysis.get_best_trial(metric='episode_reward_mean', mode='max', scope='all')
    best_checkpoint = analysis.get_best_checkpoint(best_trial, metric='episode_reward_mean', mode='max')
    best_config = analysis.get_best_config(metric='episode_reward_mean', mode='max')
    best_hes = best_config["env_config"]["hes"]
    best_tes = best_config["env_config"]["tes"]
    best_bes = best_config["env_config"]["bes"]

    # Custom testing environment registration
    test_config = ppo.DEFAULT_CONFIG.copy()
    test_config["log_level"] = "WARN"
    test_config["num_workers"] = 1
    test_config["env"] = select_test_env
    test_config["env_config"] = {
        "hes": best_hes,
        "tes": best_tes,
        "bes": best_bes
    }
    tune.register_env(select_test_env, lambda config: CartPoleEnv(config))

    ray.shutdown()

    # best_checkpoint_path = best_checkpoint[len(currentPathDirectory + "/"):].replace(",","\\,").replace("=","\\=").replace("<","\<").replace(">","\>").replace("'","\\'").replace("class", "class\\")

    # specify the command to run the rollout
    rollout_command = f"rllib rollout {best_checkpoint._local_path} --config '{{\"env\": \"CartPoleTest-v0\"}}' --run PPO --no-render --steps 2880"

    # execute the command using subprocess
    subprocess.run(rollout_command, shell=True)

    print('best_checkpoint', best_checkpoint)
    print('best_config: ', best_config)
    print('best_hes: ', best_hes, ', best_tes: ', best_tes, ', best_bes: ', best_bes)

if __name__ == "__main__":
    main()

I tried with the newer versions of ray (ray 2.5.1) and I'm facing numerous issues in the version compatibility. Can you try to provide a quick fix for this model if possible?

kmattim5 commented 1 year ago

Hi Team, Can anyone help me with this issue as it is a blockage to my work?