ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.24k stars 5.62k forks source link

[Rllib, Core, Tune] Cannot create PPOConfig from given `config_dict`! Property __stdout_file__ not supported. #39748

Closed fardinabbasi closed 1 year ago

fardinabbasi commented 1 year ago

What happened + What you expected to happen

I am trying to train a ppo agent using ray.tune, but I'm getting many warnings that leads to agent dies

class RankingEnv(gym.Env):
    def __init__(self, config: dict):
      super().__init__()
      self.coins = config['df']['tic'].unique()
      features_col = config['df'].columns.difference(['date','tic','score','close_growth(%)'])

      # self.time = 0
      self.df=config['df'].groupby('date')
      self.dates = list(self.df.groups.keys())
      # group = self.df.get_group(self.dates[self.time])
      # self.coins = group['tic'].unique()
      # features_col = group.columns.difference(['tic','score','close_growth(%)'])

      self.observation_space = gym.spaces.Dict({coin: gym.spaces.Box(low=-np.inf, high=np.inf, shape=(len(features_col),), dtype=np.float32) for coin in self.coins}) 
      self.action_space = gym.spaces.Dict({coin: gym.spaces.Box(low=np.float32(1.0), high=np.float32(len(self.coins)), shape=(1,), dtype=np.float32) for coin in self.coins})#actions are the scores
      # self.action_space = gym.spaces.Dict({coin: gym.spaces.Discrete(len(coins),start = 1) for coin in coins})

    def step(self, action):
      group = self.df.get_group(self.dates[self.time])
      # self.mask = {coin: coin in group['tic'].unique() for coin in self.coins} #masking method!
      true_scores = group.set_index('tic')['score'].to_dict()
      # The indexes are according to predicted score and the values are true score!(Sorting true score by predicted score)
      scores = [true_scores[coin] for coin in sorted(action, key=action.get, reverse=True)]
      ideal_scores = sorted(true_scores.values(), reverse=True) #This is just the True_score as list

      dcg = self.calculate_dcg(scores)
      idcg = self.calculate_dcg(ideal_scores)
      reward = dcg/idcg # reward = ndcg

      self.time+=1
      terminated = self.time >= len(self.dates)
      info = {}
      return self._get_obs() if not terminated else None, reward, terminated, False, info

    def calculate_dcg(self, scores):
      dcg = 0.0
      for i in range(len(scores)):
        dcg += (2 ** scores[i] - 1) / np.log2(i + 2)
      return dcg

    def reset(self, *, seed: Optional[int] = None, options: Optional[dict] = None):
      super().reset(seed=seed)
      self.time = 0
      info={}
      return self._get_obs() ,info

    def _get_obs(self):
      group = self.df.get_group(self.dates[self.time])
      obs = group.drop(['date','score','close_growth(%)'], axis=1)
      obs = obs.set_index('tic').agg(list, axis=1).to_dict()
      return obs
class DRLlibv2:
def __init__(
        self,
        trainable: str | Any,
        params: dict,
        train_env=None,
        run_name: str = "tune_run",
        local_dir: str = "tune_results",
        search_alg=None,
        concurrent_trials: int = 0,
        num_samples: int = 0,
        scheduler_=None,
        num_cpus: float | int = 2,
        dataframe_save: str = "tune.csv",
        metric: str = "episode_reward_mean",
        mode: str | list[str] = "max",
        max_failures: int = 0,
        training_iterations: int = 100,
        checkpoint_num_to_keep: None | int = None,
        checkpoint_freq: int = 0,
        reuse_actors: bool = False
    ):
        self.params = params

        # if train_env is not None:
        #     register_env(self.params['env'], lambda env_config: train_env(env_config))

        self.train_env = train_env
        self.run_name = run_name
        self.local_dir = local_dir
        self.search_alg = search_alg
        if concurrent_trials != 0:
            self.search_alg = ConcurrencyLimiter(
                self.search_alg, max_concurrent=concurrent_trials
            )
        self.scheduler_ = scheduler_
        self.num_samples = num_samples
        self.trainable = trainable
        if isinstance(self.trainable, str):
            self.trainable = self.trainable.upper()
        self.num_cpus = num_cpus
        self.dataframe_save = dataframe_save
        self.metric = metric
        self.mode = mode
        self.max_failures = max_failures
        self.training_iterations = training_iterations
        self.checkpoint_freq = checkpoint_freq
        self.checkpoint_num_to_keep = checkpoint_num_to_keep
        self.reuse_actors = reuse_actors

    def train_tune_model(self):

        if ray.is_initialized():
          ray.shutdown()

        ray.init(num_cpus=self.num_cpus, num_gpus=self.params['num_gpus'], ignore_reinit_error=True)

        if self.train_env is not None:
            register_env(self.params['env'], lambda env_config: self.train_env(env_config))

        tuner = tune.Tuner(
            self.trainable,
            param_space=self.params,
            tune_config=TuneConfig(
                search_alg=self.search_alg,
                scheduler=self.scheduler_,
                num_samples=self.num_samples,
                # metric=self.metric,
                # mode=self.mode,
                **({'metric': self.metric, 'mode': self.mode} if self.scheduler_ is None else {}),
                reuse_actors=self.reuse_actors,

            ),
            run_config=RunConfig(
                name=self.run_name,
                storage_path=self.local_dir,
                failure_config=FailureConfig(
                    max_failures=self.max_failures, fail_fast=False
                ),
                stop={"training_iteration": self.training_iterations},
                checkpoint_config=CheckpointConfig(
                    num_to_keep=self.checkpoint_num_to_keep,
                    checkpoint_score_attribute=self.metric,
                    checkpoint_score_order=self.mode,
                    checkpoint_frequency=self.checkpoint_freq,
                    checkpoint_at_end=True,
                ),
                verbose=3,#Verbosity mode. 0 = silent, 1 = default, 2 = verbose, 3 = detailed
            ),
        )

        self.results = tuner.fit()
        if self.search_alg is not None:
            self.search_alg.save_to_dir(self.local_dir)
        # ray.shutdown()
        return self.results

    def infer_results(self, to_dataframe: str = None, mode: str = "a"):

        results_df = self.results.get_dataframe()

        if to_dataframe is None:
            to_dataframe = self.dataframe_save

        results_df.to_csv(to_dataframe, mode=mode)

        best_result = self.results.get_best_result()
        # best_result = self.results.get_best_result()
        # best_metric = best_result.metrics
        # best_checkpoint = best_result.checkpoint
        # best_trial_dir = best_result.log_dir
        # results_df = self.results.get_dataframe()

        return results_df, best_result

    def restore_agent(
        self,
        checkpoint_path: str = "",
        restore_search: bool = False,
        resume_unfinished: bool = True,
        resume_errored: bool = False,
        restart_errored: bool = False,
    ):

        # if restore_search:
        # self.search_alg = self.search_alg.restore_from_dir(self.local_dir)
        if checkpoint_path == "":
            checkpoint_path = self.results.get_best_result().checkpoint._local_path

        restored_agent = tune.Tuner.restore(
            checkpoint_path,
            restart_errored=restart_errored,
            resume_unfinished=resume_unfinished,
            resume_errored=resume_errored,
        )
        print(restored_agent)
        self.results = restored_agent.fit()

        if self.search_alg is not None:
            self.search_alg.save_to_dir(self.local_dir)
        return self.results

    def get_test_agent(self, test_env_name: str, test_env=None, checkpoint=None):

        # if test_env is not None:
        #     register_env(test_env_name, lambda config: [test_env])

        if checkpoint is None:
            checkpoint = self.results.get_best_result().checkpoint

        testing_agent = Algorithm.from_checkpoint(checkpoint)
        # testing_agent.config['env'] = test_env_name

        return testing_agent

Versions / Dependencies

Reproduction script

train_env_config = {'df': train_data}
train_config = (PPOConfig()
          .training(lr=tune.loguniform(5e-5, 0.001), entropy_coeff=tune.loguniform(0.00000001, 0.1),sgd_minibatch_size=tune.choice([32, 64, 128, 256, 512]),lambda_=tune.choice([0.1,0.3,0.5,0.7,0.9,1.0]))
          .resources(num_gpus=0)
          .debugging(log_level="DEBUG", seed = 1234)
          .rollouts(num_rollout_workers=1)
          .framework("torch")
          .environment(env="RankingEnv_train", disable_env_checking=True, env_config=train_env_config)
          )
train_config.model['fcnet_hiddens'] = [256, 256]
search_alg = OptunaSearch(metric="episode_reward_mean",mode="max")#what if metric=step reward??
scheduler_ = ASHAScheduler(metric="episode_reward_mean",mode="max",max_t=5,grace_period=1,reduction_factor=2)#max_t: The maximum budget for each trial(hyperparameter), in seconds. grace_period: The number of seconds to wait before terminating a trial that has not reported any results.
# wandb_callback = WandbLoggerCallback(project="Ray Tune Trial Run",log_config=True,save_checkpoints=True)

drl_agent = DRLlibv2(
    trainable="PPO",
    train_env = lambda: RankingEnv,
    run_name = "PPO_TRAIN",
    local_dir = "/content/PPO_TRAIN",
    params = train_config.to_dict(),
    num_samples = 1,#Number of samples of hyperparameters config to run
    training_iterations=5,
    checkpoint_freq=5,
    # scheduler_=scheduler_,
    search_alg=search_alg,
    metric = "episode_reward_mean",
    mode = "max"
    # callbacks=[wandb_callback]
)

res = drl_agent.train_tune_model()
results_df, best_result = drl_agent.infer_results()

(pid=5706) /usr/local/lib/python3.10/dist-packages/tensorflow_probability/python/init.py:57: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. (pid=5706) if (distutils.version.LooseVersion(tf.version) < (pid=5706) DeprecationWarning: DirectStepOptimizer has been deprecated. This will raise an error in the future! (pid=5706) /usr/local/lib/python3.10/dist-packages/google/rpc/init.py:20: DeprecationWarning: Deprecated call to pkg_resources.declare_namespace('google.rpc'). (pid=5706) Implementing implicit namespace packages (as specified in PEP 420) is preferred to pkg_resources.declare_namespace. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages (pid=5706) pkg_resources.declare_namespace(name) (pid=5706) /usr/local/lib/python3.10/dist-packages/pkg_resources/init.py:2349: DeprecationWarning: Deprecated call to pkg_resources.declare_namespace('google'). (pid=5706) Implementing implicit namespace packages (as specified in PEP 420) is preferred to pkg_resources.declare_namespace. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages (pid=5706) declare_namespace(parent) (PPO pid=5706) 2023-09-19 15:57:32,398 WARNING algorithm_config.py:2578 -- Setting exploration_config={} because you set _enable_rl_module_api=True. When RLModule API are enabled, exploration_config can not be set. If you want to implement custom exploration behaviour, please modify the forward_exploration method of the RLModule at hand. On configs that have a default exploration config, this must be done with config.exploration_config={}. (PPO pid=5706) 2023-09-19 15:57:32,399 WARNING algorithm_config.py:672 -- Cannot create PPOConfig from given config_dict! Property __stdout_file__ not supported. (pid=5784) /usr/local/lib/python3.10/dist-packages/tensorflow_probability/python/init.py:57: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. (pid=5784) if (distutils.version.LooseVersion(tf.version) < (pid=5784) DeprecationWarning: DirectStepOptimizer has been deprecated. This will raise an error in the future! (pid=5784) /usr/local/lib/python3.10/dist-packages/google/rpc/init.py:20: DeprecationWarning: Deprecated call to pkg_resources.declare_namespace('google.rpc'). (pid=5784) Implementing implicit namespace packages (as specified in PEP 420) is preferred to pkg_resources.declare_namespace. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages (pid=5784) pkg_resources.declare_namespace(name) (pid=5784) /usr/local/lib/python3.10/dist-packages/pkg_resources/init.py:2349: DeprecationWarning: Deprecated call to pkg_resources.declare_namespace('google'). (pid=5784) Implementing implicit namespace packages (as specified in PEP 420) is preferred to pkg_resources.declare_namespace. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages (pid=5784) declare_namespace(parent) (PPO pid=5706) 2023-09-19 15:57:41,273 ERROR actor_manager.py:500 -- Ray error, taking actor 1 out of service. The actor died because of an error raised in its creation task, ray::RolloutWorker.init() (pid=5784, ip=172.28.0.12, actor_id=87f2e00c598d2835416acdce01000000, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7c93458ee920>) (PPO pid=5706) File "/usr/local/lib/python3.10/dist-packages/ray/rllib/evaluation/rollout_worker.py", line 397, in init (PPO pid=5706) self.env = env_creator(copy.deepcopy(self.env_context)) (PPO pid=5706) File "", line 186, in (PPO pid=5706) TypeError: () takes 0 positional arguments but 1 was given (PPO pid=5706) Exception raised in creation task: The actor died because of an error raised in its creation task, ray::PPO.init() (pid=5706, ip=172.28.0.12, actor_id=67b96a420d1057972a8a600601000000, repr=PPO) (PPO pid=5706) File "/usr/local/lib/python3.10/dist-packages/ray/rllib/algorithms/algorithm.py", line 517, in init (PPO pid=5706) super().init( (PPO pid=5706) File "/usr/local/lib/python3.10/dist-packages/ray/tune/trainable/trainable.py", line 185, in init (PPO pid=5706) self.setup(copy.deepcopy(self.config)) (PPO pid=5706) File "/usr/local/lib/python3.10/dist-packages/ray/rllib/algorithms/algorithm.py", line 639, in setup (PPO pid=5706) self.workers = WorkerSet( (PPO pid=5706) File "/usr/local/lib/python3.10/dist-packages/ray/rllib/evaluation/worker_set.py", line 179, in init (PPO pid=5706) raise e.args[0].args[2] (PPO pid=5706) File "/usr/local/lib/python3.10/dist-packages/ray/rllib/evaluation/rollout_worker.py", line 397, in init (PPO pid=5706) self.env = env_creator(copy.deepcopy(self.env_context)) (PPO pid=5706) File "", line 186, in (PPO pid=5706) TypeError: () takes 0 positional arguments but 1 was given (RolloutWorker pid=5784) Exception raised in creation task: The actor died because of an error raised in its creation task, ray::RolloutWorker.init() (pid=5784, ip=172.28.0.12, actor_id=87f2e00c598d2835416acdce01000000, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7c93458ee920>) 2023-09-19 15:57:41,292 ERROR tune_controller.py:1502 -- Trial task failed for trial PPO_RankingEnv_train_c5d87e81 Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/ray/air/execution/_internal/event_manager.py", line 110, in resolve_future result = ray.get(future) File "/usr/local/lib/python3.10/dist-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper return fn(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/ray/_private/client_mode_hook.py", line 103, in wrapper return func(args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py", line 2549, in get raise value File "python/ray/_raylet.pyx", line 1999, in ray._raylet.task_execution_handler File "python/ray/_raylet.pyx", line 1894, in ray._raylet.execute_task_with_cancellation_handler File "python/ray/_raylet.pyx", line 1558, in ray._raylet.execute_task File "python/ray/_raylet.pyx", line 1559, in ray._raylet.execute_task File "python/ray/_raylet.pyx", line 1791, in ray._raylet.execute_task File "python/ray/_raylet.pyx", line 910, in ray._raylet.store_task_errors ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::PPO.init() (pid=5706, ip=172.28.0.12, actor_id=67b96a420d1057972a8a600601000000, repr=PPO) File "/usr/local/lib/python3.10/dist-packages/ray/rllib/algorithms/algorithm.py", line 517, in init super().init( File "/usr/local/lib/python3.10/dist-packages/ray/tune/trainable/trainable.py", line 185, in init self.setup(copy.deepcopy(self.config)) File "/usr/local/lib/python3.10/dist-packages/ray/rllib/algorithms/algorithm.py", line 639, in setup self.workers = WorkerSet( File "/usr/local/lib/python3.10/dist-packages/ray/rllib/evaluation/worker_set.py", line 179, in init raise e.args[0].args[2] File "/usr/local/lib/python3.10/dist-packages/ray/rllib/evaluation/rollout_worker.py", line 397, in init self.env = env_creator(copy.deepcopy(self.env_context)) File "", line 186, in TypeError: () takes 0 positional arguments but 1 was given 2023-09-19 15:57:41,357 WARNING experiment_state.py:371 -- Experiment checkpoint syncing has been triggered multiple times in the last 30.0 seconds. A sync will be triggered whenever a trial has checkpointed more than num_to_keep times since last sync or if 300 seconds have passed since last sync. If you have set num_to_keep in your CheckpointConfig, consider increasing the checkpoint frequency or keeping more checkpoints. You can supress this warning by changing the TUNE_WARN_EXCESSIVE_EXPERIMENT_CHECKPOINT_SYNC_THRESHOLD_S environment variable. 2023-09-19 15:57:41,368 ERROR tune.py:1139 -- Trials did not complete: [PPO_RankingEnv_train_c5d87e81] 2023-09-19 15:57:41,375 WARNING experiment_analysis.py:205 -- Failed to fetch metrics for 1 trial(s):

Trial PPO_RankingEnv_train_c5d87e81 errored after 0 iterations at 2023-09-19 15:57:41. Total running time: 20s Error file: /root/ray_results/PPO_TRAIN/PPO_RankingEnv_train_c5d87e81_1_type=StochasticSampling,disable_action_flattening=False,disable_execution_plan_api=True,disable_in_2023-09-19_15-57-21/error.txt

Issue Severity

High: It blocks me from completing my task.

sven1977 commented 1 year ago

Hmm, something is wrong with the env registration lambda. I think somewhere you provide a env creator function that has no input arguments.

Maybe here?

train_env = lambda: RankingEnv,

However RLlib always passes in the config.env_config dict when it calls the registered env creator, which is why you are getting this error.

Changing your code to the following should help:

train_env = lambda env_config: RankingEnv,
sven1977 commented 1 year ago

I'm closing this issue for now. Feel free to re-open should you still have problems with your example after fixing your custom env creator function.

fardinabbasi commented 1 year ago

Hmm, something is wrong with the env registration lambda. I think somewhere you provide a env creator function that has no input arguments.

Maybe here?

train_env = lambda: RankingEnv,

However RLlib always passes in the config.env_config dict when it calls the registered env creator, which is why you are getting this error.

Changing your code to the following should help:

train_env = lambda env_config: RankingEnv,

Thank you for your prompt reply. I have made the changes as per your suggestions:

drl_agent = DRLlibv2(
    trainable="PPO",
    train_env = lambda env_config: RankingEnv,
    run_name = "PPO_TRAIN",
    local_dir = "/content/PPO_TRAIN",
    params = train_config.to_dict(),
    num_samples = 1,#Number of samples of hyperparameters config to run
    training_iterations=5,
    checkpoint_freq=5,
    # scheduler_=scheduler_,
    search_alg=search_alg,
    metric = "episode_reward_mean",
    mode = "max"
    # callbacks=[wandb_callback]
)

However, I am still encountering the same warnings and experiencing failures. My primary goal is to pass my custom environment class named RankingEnv to the DRLlibv2 class so that I can run it with Ray Tune. In the train_tune_model function of DRLlibv2, I register my environment using register_env and then pass its name to tune.Tuner:

register_env(self.params['env'], lambda env_config: self.train_env(env_config))

I also attempted to register it like this, but it did not resolve the issue:

register_env(self.params['env'], self.train_env)

I would appreciate any further guidance or insights you can provide to help me resolve this issue. Thank you.

lyzyn commented 11 months ago

I have also encountered the issue with your error report. Have you resolved it? WARNING algorithm_config.py:2578 -- Setting exploration_config={} because you set _enable_rl_module_api=True. When RLModule API are enabled, exploration_config can not be set. If you want to implement custom exploration behaviour, please modify the forward_exploration method of the RLModule at hand. On configs that have a default exploration config, this must be done with config.exploration_config={}.

fardinabbasi commented 11 months ago

I have also encountered the issue with your error report. Have you resolved it? WARNING algorithm_config.py:2578 -- Setting exploration_config={} because you set _enable_rl_module_api=True. When RLModule API are enabled, exploration_config can not be set. If you want to implement custom exploration behaviour, please modify the forward_exploration method of the RLModule at hand. On configs that have a default exploration config, this must be done with config.exploration_config={}.

Unfortunately I still have this issue, please let me know if you find a solution.

lyzyn commented 11 months ago

我也遇到了您的错误报告的问题。你解决了吗?警告algorithm_config.py:2578 - 设置exploration_config = {},因为您设置了_enable_rl_module_api = True。当RLModule API启用时,exploration_config不能被设置。如果您想实现自定义探索行为,请修改当前 RLModule 的forward_exploration 方法。在具有默认探索配置的配置上,必须使用 config.exploration_config={} 来完成此操作。

不幸的是,我仍然遇到这个问题,如果您找到解决方案,请告诉我。

Okay, thank you! I am wondering if it is due to a problem with the Ray version, which I am using with version 2.7.0. Would it change the current issue if we downgrade Ray's version to 2.5. x.

fardinabbasi commented 11 months ago

我也遇到了您的错误报告的问题。你解决了吗?警告algorithm_config.py:2578 - 设置exploration_config = {},因为您设置了_enable_rl_module_api = True。当RLModule API启用时,exploration_config不能被设置。如果您想实现自定义探索行为,请修改当前 RLModule 的forward_exploration 方法。在具有默认探索配置的配置上,必须使用 config.exploration_config={} 来完成此操作。

不幸的是,我仍然遇到这个问题,如果您找到解决方案,请告诉我。

Okay, thank you! I am wondering if it is due to a problem with the Ray version, which I am using with version 2.7.0. Would it change the current issue if we downgrade Ray's version to 2.5. x.

It seems related to #40205